You are on page 1of 539

C

OMFUT

DANAH.BALLARDCHRISTOPHER M.BROWN

COMPUTER VISION
DanaH.Ballard Christopher M.Brown
Department of Computer Science University of Rochester Rochester, New York

PRENTICEHALL,

INC., Englewood Cliffs, New Jersey 07632

Libraryof Congress Cataloging inPublication Data


BALLARD. DANA HARRY.

Computer vision. Bibliography: p. Includes index. I. Image processing. I. Brown, Christopher M. II. Title. TA1632.B34 621.38'04I4 8120974 ISBN 0131653164 AACR2 Cover design by Robin Breite

1982 by PrenticeHall, Inc. Englewood Cliffs, New Jersey 07632

All rights reserved. No part of this book may be reproduced in any form or by any means without permission in writing from the publisher.

Printed in the United States of America 10 9 8 7 6 5 4 3 2

ISBN

D13XtS31bM

PRENTICEHALL INTERNATIONAL, INC., London PRENTICEHALL OF AUSTRALIA PTY. LIMITED, Sydney PRENTICEHALL OF CANADA, LTD., Toronto PRENTICEHALL OF INDIA PRIVATE LIMITED, New Delhi PRENTICEHALL O F JAPAN, INC., Tokyo PRENTICEHALL OF SOUTHEAST ASIA PTE. LTD., Singapore WHITEHALL BOOKS LIMITED, Wellington, New Zealand

Preface

xiii xv

Acknowledgments

Mnemonics for Proceedings and Special Collections Cited in References xix

1 COMPUTER VISION
1.1 1.2 1.3 1.4 1.5 AchievingSimpleVisionGoals 1 HighLevel and LowLevel Capabilities 2 A Range of Representations 6 The Role of Computers 9 Computer Vision Research and Applications

12

Part I GENERALIZED IMAGES 13 IMAGE FORMATION


2.1 2.2 Images 17 Image Model 18 2.2.1 ImageFunctions, 18 2.2.2 Imaging Geometry, 19 2.2.3 Reflectance, 22 2.2.4 Spatial Properties, 24 2.2.5 Color, 31 2.2.6 Digital Images, 35 Imaging Devices for C o m p u t e r Vision 2.3.1 Photographic Imaging,44 2.3.2 Sensing Range, 52 2.3.3 Reconstruction Imaging, 56

2.3

42

EARLY PROCESSING
3.1 3.2 Recovering Intrinsic Structure 63 Filtering the Image 65 3.2.1 Template Matching, 65 3.2.2 Histogram Transformations, 70 3.2.3 Background Subtraction, 72 3.2.4 Filtering and Reflectance Models, 73 Finding Local Edges 75 3.3.1 Typesof EdgeOperators,76 3.3.2 Edge Thresholding Strategies, 80 3.3.3 ThreeDimensional Edge Operators, 81 3.3.4 How Good Are Edge Operators? 83 3.3.5 Edge Relaxation, 85

3.3

3.4

Range Information from Geometry

88 93

3.4.1 Stereo Vision and Triangulation, 88 3.4.2 A Relaxation Algorithm for Stereo, 89

3.5 Surface Orientation from Reflectance Models


3.5.1 3.5.2 3.5.3 3.5.4 Reflectivity Functions, 93 Surface Gradient, 95 Photometric Stereo, 98 Shape from Shading by Relaxation, 99

3.6 Optical Flow

102

3.6.1 The Fundamental FlowConstraint, 102 3.6.2 Calculating Optical Flow by Relaxation, 103 3.7 Resolution Pyramids 106 3.7.1 GrayLevel Consolidation, 106 3.7.2 Pyramidal Structures in Correlation, 107 3.7.3 Pyramidal Structures in Edge Detection, 109

PART II SEGMENTED IMAGES 115 4 BOUNDARY DETECTION


4.1 4.2 On Associating Edge Elements 119 Searching N e a r an A p p r o x i m a t e Location 121 4.2.1 Adjusting A Priori Boundaries, 121 4.2.2 NonLinear Correlation in Edge Space, 121 4.2.3 DivideandConquer Boundary Detection, 122 T h e H o u g h Method for Curve Detection 123 4.3.1 UseoftheGradient, 124 4.3.2 Some Examples, 125 4.3.3 Trading Off Work in Parameter Space for Work in Image Space, 126 4.3.4 Generalizingthe Hough Transform, 128 Edge Following as G r a p h Searching 131 4.4.1 GoodEvaluation Functions,133 4.4.2 Finding All the Boundaries, 133 4.4.3 Alternatives to the A Algorithm, 136 Edge 4.5.1 4.5.2 4.5.3 4.5.4 Following as D y n a m i c P r o g r a m m i n g 137 Dynamic Programming, 137 Dynamic Programming for Images, 139 Lower Resolution Evaluation Functions, 141 Theoretical Questions about Dynamic Programming, 143

4.3

4.4

4.5

4.6

Contour Following 143 4.6.1 Extension toGrayLevel Images, 144 4.6.2 Generalization to HigherDimensional Image Data, 146

5 REGION GROWING
5.1 5.2 5.3 Regions 149 151 A Local Technique: Blob Coloring Global Techniques: Region G r o w i n g via Thresholding 5.3.1 ThresholdinginMultidimensional Space, 153 5.3.2 Hierarchical Refinement, 155 Splitting a n d Merging 155 5.4.1 StateSpaceApproach to Region Growing, 157 5.4.2 LowLevel Boundary Data Structures, 158 5.4.3 GraphOriented Region Structures, 159 Incorporation of Semantics 160

5.4

5.5

6 TEXTURE
6.1 6.2 W h a t Is Texture? Texture Primitives 166 169

Contents

6.3

6.4

Structural Models of Texel Placement 170 6.3.1 Grammatical Models, 172 6.3.2 Shape Grammars, 173 6.3.3 Tree Grammars, 175 6.3.4 Array Grammars, 178 Texture as a Pattern Recognition Problem 181 6.4.1 Texture Energy, 184 6.4.2 Spatial GrayLevel Dependence, 186 6.4.3 Region Texels, 188

6.5 The Texture Gradient

189

7 MOTION
7.1 M o t i o n U n d e r s t a n d i n g 195 7.1.1 DomainIndependent Understanding, 196 7.1.2 DomainDependent Understanding, 196 Understanding Optical Flow 199 7.2.1 Focusof Expansion, 199 7.2.2 Adjacency, Depth, and Collision, 201 7.2.3 Surface Orientation and Edge Detection, 202 7.2.4 Egomotion, 206 U n d e r s t a n d i n g Image Sequences 207 7.3.1 Calculating Flow from Discrete Images,207 7.3.2 Rigid Bodies from Motion, 210 7.3.3 Interpretation of Moving Light DisplaysA DomainIndependent Approach, 214 7.3.4 Human Motion UnderstandingA Model Directed Approach, 217 7.3.5 Segmented Images, 220

7.2

7.3

Part III GEOMETRICAL STRUCTURES 227 8 REPRESENTATION OF TWODIMENSIONAL GEOMETRIC STRUCTURES


8.1 TwoDimensional Geometric Structures 8.2 Boundary Representations 232
8.2.1 8.2.2 8.2.3 8.2.4 8.2.5 8.2.6 8.2.7 Polylines,232 Chain Codes, 235 The tyj Curve, 237 Fourier Descriptors, 238 Conic Sections, 239 BSplines, 239 Strip Trees, 244

231

8.3

Region Representations 247 8.3.1 Spatial Occupancy Array, 247 8.3.2 y Axis, 248 8.3.3 Quad Trees, 249 8.3.4 Medial Axis Transform, 252 8.3.5 Decomposing Complex Areas, 253 Simple Shape Properties 254 8.4.1 Area,254 8.4.2 Eccentricity, 255 8.4.3 Euler Number, 255 8.4.4 Compactness, 256 8.4.5 Slope Density Function, 256 8.4.6 Signatures, 257 8.4.7 Concavity Tree, 258 8.4.8 Shape Numbers, 258

8.4

9 REPRESENTATION OF THREEDIMENSIONAL STRUCTURES


9.1 Solids and Their Representation 9.2 Surface Representations 265 264

9.2.1 Surfaces with Faces,265 9.2.2 Surfaces Based on Splines, 268 9.2.3 Surfaces That Are Functions on the Sphere, 270

9.3 Generalized Cylinder Representations

274

9.3.1 Generalized CylinderCoordinate Systems and Properties, 275 9.3.2 Extracting Generalized Cylinders, 278 9.3.3 A Discrete Volumetric Version of theSkeleton, 279

9.4 Volumetric Representations


9.4.1 9.4.2 9.4.3 9.4.4 9.5

280

Spatial Occupancy, 280 Cell Decomposition, 281 Constructive Solid Geometry, 282 Algorithms for Solid Representations, 284

U n d e r s t a n d i n g Line Drawings 291 9.5.1 Matching LineDrawings to ThreeDimensional Primitives, 293 9.5.2 Grouping Regions Into Bodies, 294 9.5.3 Labeling Lines, 296 9.5.4 Reasoning About Planes, 301

Part IV RELATIONAL STRUCTURES 313


1 0 KNOWLEDGE REPRESENTATION AND USE
10.1 Representations 317
10.1.1 The Knowledge BaseModels and Processes,318

10.1.2 Analogical and Propositional Representations, 319 10.1.3 Procedural Knowledge, 321 10.1.4 Computer Implementations, 322

10.2 Semantic Nets

323 334 340

10.2.1 Semantic Net Basics,323 10.2.2 Semantic Nets for Inference, 327

10.3 Semantic Net Examples

10.3.1 Frame Implementations, 334 10.3.2 Location Networks, 335

10.4 Control Issues in Complex Vision Systems

10.4.1 Paralleland SerialComputation, 341 10.4.2 Hierarchical and Heterarchical Control, 341 10.4.3 Belief Maintenance and Goal Achievement, 346

1 1 MATCHING
1.1 Aspects of Matching 352
11.1.1 Interpretation: Construction, Matching, and Labeling 352 Il.l.2 Matching Iconic, Geometric, and Relational Structures, 353

352

1.2 GraphTheoretic Algorithms


11.2.1 TheAlgorithms,357 11.2.2 Complexity, 359

355 360

1.3 Implementing GraphTheoretic Algorithms

1.4

11.3.1 Matching Metrics,360 11.3.2 Backtrack Search, 363 11.3.3 Association Graph Techniques, 365 Matching in Practice 369 11.4.1 Decision Trees,370 11.4.2 Decision Tree and Subgraph Isomorphism, 375 11.4.3 Informal Feature Classification, 376 11.4.4 A Complex Matcher, 378

1 2 INFERENCE
12.1 FirstOrder Predicate Calculus 384 12.1.1 ClauseForm Syntax (Informal), 384 12.1.2 Nonclausal Syntax and Logic Semantics (Informal), 385 12.1.3 Converting Nonclausal Form to Clauses,387 12.1.4 Theorem Proving, 388 12.1.5 Predicate Calculus and Semantic Networks, 390 12.1.6 Predicate Calculus and Knowledge Representation, 392

383

12.2 Computer Reasoning 12.3 Production Systems

395 396

12.3.1 Production System Details, 398 12.3.2 Pattern Matching, 399

Contents

12.3.3 An Example, 401 12.3.4 Production System Pros and Cons, 406

12.4 Scene Labeling and Constraint Relaxation

408

12.5

12.4.1 Consistent and Optimal Labelings,408 12.4.2 Discrete Labeling Algorithms, 410 12.4.3 A Linear Relaxation Operator and a Line Labeling Example, 415 12.4.4 ANonlinear Operator, 419 12.4.5 Relaxation as Linear Programming, 420 Active Knowledge 430 12.5.1 Hypotheses,431 12.5.2 HOWTO and SOWHAT Processes, 431 12.5.3 Control Primitives, 431 12.5.4 Aspects of Active Knowledge, 433

1 3 GOAL ACHIEVEMENT
13.1 Symbolic Planning
13.1.1 13.1.2 13.1.3 13.1.4 13.2

438

439

RepresentingtheWorld,439 Representing Actions, 441 Stacking Blocks, 442 The Frame Problem, 444

Planning with Costs 445 13.2.1 Planning, Scoring,and Their Interaction, 446 13.2.2 Scoring Simple Plans, 446 13.2.3 Scoring Enhanced Plans, 451 13.2.4 Practical Simplifications, 452 13.2.5 A Vision System Based on Planning, 453

APPENDICES 465 A 1 SOME MATHEMATICAL TOOLS


Al.l Coordinate Systems
Al.I.I Al.l.2 Al.l.3 Al.l.4 A l . 2

465

465

Cartesian, 465 Polar and Polar Space, 465 Spherical and Cylindrical, 466 Homogeneous Coordinates, 467

Trigonometry 468 Al.2.1 PlaneTrigonometry, 468 A1.2.2 Spherical Trigonometry, 469 Vectors Matrices 469 471

A1.3 A1.4 A1.5

Lines 474 A1.5.1 Two Points,474 A1.5.2 Point and Direction, 474 A1.5.3 Slope and Intercept, 474

Contents

xi

A1.5.4 Ratios, 474 Al.5.5 Normal and Distance from Origin (Line Equation), 475 A1.5.6 Parametric,476

A1.6 Planes 476 A1.7 Geometric Transformations


A1.7.1 A1.7.2 A1.7.3 Al.7.4 A1.7.5 Al.7.6 A1.7.7

477

Rotation,477 Scaling, 478 Skewing, 479 Translation, 479 Perspective, 479 Transforming Lines and Planes, 480 Summary, 480

A1.8 Camera Calibration and Inverse Perspective


A1.8.1 Camera Calibration, 482 A1.8.2 Inverse Perspective, 483

481

A1.9 LeastSquaredError Fitting

484

A1.9.1 PseudoInverseMethod,485 A1.9.2 Principal Axis Method, 486 Al.9.3 Fitting Curves by the PseudoInverse Method, 487

A1.10 Conies 488 A1.11 Interpolation 489


Al.11.1 OneDimensional, 489 A1.11.2 TwoDimensional, 490

A1.12 The Fast Fourier Transform A1.13 The Icosahedron 492 A1.14 Root Finding 493

490

A 2 ADVANCED CONTROL MECHANISMS


A2.1 Standard Control Structures
A2.1.1 Recursion, 498 A2.1.2 CoRoutining, 498

497 499 500

A2.2 Inherently Sequential Mechanisms


A2.2.1 Automatic Backtracking,499 A2.2.2 Context Switching, 500

A2.3 Sequential or Parallel Mechanisms


A2.3.1 A2.3.2 A2.3.3 A2.3.4 Modules and Messages,500 Priority Job Queue, 502 PatternDirected Invocation, 504 Blackboard Systems, 505

AUTHOR INDEX SUBJECTINDEX

Preface
Thedreamofintelligentautomatagoesbacktoantiquity;itsfirstmajor articulation in the context of digital computers was by Turing around 1950.Since then, this dream has been pursued primarily by workers in thefieldofartificial intelligence, whose goal is to endow computers with informationprocessing capabilities comparable to thoseofbiological organisms. From theoutset, oneofthegoalsof artificial intelligencehasbeentoequipmachineswiththecapabilityofdealingwith sensory inputs. Computervisionis the construction of explicit, meaningful descriptions of physical objects from images. Image understanding is very different from image processing, which studies imagetoimage transformations, not explicit description building. Descriptions are a prerequisite for recognizing, manipulating, and thinking about objects. We perceive a world of coherent threedimensional objects with many invariant properties. Objectively, the incoming visual data do not exhibit corresponding coherence or invariance; they contain much irrelevant or even misleading variation. Somehow our visual system, from the retinal to cognitive levels,understands, or imposesorder on, chaoticvisual input. It doessobyusing intrinsicinformationthatmayreliablybeextractedfrom theinput,andalsothrough assumptions and knowledgethat areapplied at various levelsinvisualprocessing. The challenge of computer vision is one of explicitness. Exactly what information about scenes can be extracted from an image using only very basic assumptions about physics and optics? Explicitly, what computations must be performed? Then, at what stage must domaindependent, prior knowledge about the world beincorporated intotheunderstanding process?How areworld models and knowledge represented and used?This book isabout the representations and mechanismsthatallowimageinformation andpriorknowledgetointeractinimage understanding. Computer vision is a relatively new and fastgrowing field. The first experiments were conducted in the late 1950s,and many oftheessential concepts
xiii

havebeendevelopedduringthelast ive ears.Withthisrapidgrowth,crucialideas f y have arisen in disparate areas such asartificial intelligence, psychology, computer graphics,andimageprocessing.Ourintentistoassembleaselectionofthismaterial in a form that will serve both as asenior/graduatelevel academic text and as a useful reference to thosebuilding vision systems.Thisbook has astrong artificial intelligenceflavor,andwehopethiswillprovokethought.Webelievethatboththe intrinsic image information and theinternal model of theworld are important in successful vision systems. Thebookisorganized intofourparts,based ondescriptionsofobjectsat four different levelsof abstraction. 1. Generalizedimagesimagesandimagelikeentities. 2. Segmented imagesimages organized into subimages that are likely to correspond to"interesting objects." 3. Geometricstructuresquantitativemodelsofimageandworldstructures. 4. Relational structurescomplex symbolicdescriptions ofimageand world structures. Theparts follow aprogression of increasing abstractness. Although the four partsaremostnaturallystudiedinsuccession,theyarenottightlyinterdependent.Part Iisaprerequisitefor PartII,but PartsIIIand IVcan beread independently. Parts of the book assume some mathematical and computing background (calculus,linearalgebra,datastructures,numericalmethods).However,throughout thebookmathematicalrigortakesabackseattoconcepts.Ourintentistotransmitaset ofideasabout anewfieldtothewidestpossibleaudience. Inonebookitisimpossibletodojusticetothescopeanddepthofpriorworkin computervision.Further,werealizethatinafastdevelopingfield,therapidinfluxof newideaswillcontinue.Wehopethatourreaderswillbechallengedtothink,criticize, read further, and quicklygobeyondtheconfines ofthisvolume.

xiv

Preface

Acknowledgments
Jerry Feldman and HerbVoelcker(andthrough them theUniversity ofRochester) provided many resources for thiswork. One ofthemost important wasacapable and forgiving staff (secretarial, technical, and administrative). For massive text editing, valuable advice, and good humor weareespecially grateful to Rose Peet. Peggy Meeker, Jill Orioli, and Beth Zimmerman all helped at various stages. Several colleagues made suggestions on early drafts: thanks to James Allen, Norm Badler, Larry Davis, Takeo Kanade, John Render, Daryl Lawton, Joseph O'Rourke, Ari Requicha, Ed Riseman, Azriel Rosenfeld, Mike Schneier, Ken Sloan, Steve Tanimoto, Marty Tenenbaum, and Steve Zucker. Graduatestudentshelped inmanydifferent ways:thanksespeciallyto Michel Denber, Alan Frisch, Lydia Hrechanyk, Mark Kahrs, Keith Lantz, Joe Maleson, LeeMoore, Mark Peairs,Don Perlis,Rick Rashid,Dan Russell,Dan Sabbah,Bob Schudy,PeterSelfridge, UriShani,andBobTilove.BernhardStuthdeservesspecial mention for muchcareful andcriticalreading. Finally,thanks goto JaneBallard, mostly for standing steadfast through the cycles ofelation and depression and for numerous engineeringtoEnglish transla tions. AsPatWinstonputit:"Awillingnesstohelpisnotanimplied endorsement." The aid of others was invaluable, but we alone are responsible for the opinions, technical details, and faults of this book. Funding assistancewasprovidedbytheSloan Foundation underGrant784 15,bytheNational Institutes ofHealth underGrant HL21253,andbythe Defense Advanced Research Projects Agency under Grant N0001478C0164. The authors wish to credit the following sources for figures and tables. For complete citations given here in abbreviated form (as "from ..." or "after .. ."), refer to the appropriate chapterend references.
Fig. 1.2 from Shani, U., "A 3D modeldriven system for the recognition of abdominal anatomy from CTscans,"TR77, Dept.ofComputer Science,University of Rochester, May 1980.
Acknowledgments

xv

Fig. 1.4 courtesy of Allen Hanson and Ed Riseman, COINS Research Project, University of Massachusetts, Amherst, MA. Fig. 2.4 after Horn and Sjoberg, 1978. Figs. 2.5, 2.9, 2.10, 3.2, 3.6, and 3.7 courtesy of Bill Lampeter. Fig.2.7a painting by Louis Condax; courtesy of Eastman Kodak Company and the Optical Society of America. Fig.2.8a courtesy of D.Greenberg and G. Joblove, Cornell Program ofComputer Graphics. Fig. 2.8b courtesy of Tom Check. Table 2.3 after Gonzalez and Wintz, 1977. Fig. 2.18 courtesy of EROS Data Center, Sioux Falls, SD. Figs. 2.19 and 2.20 from Herrick, C.N., Television Theory and Servicing: Black/White and Color, 2nd Ed. Reston, VA: Reston, 1976. Figs. 2.21,2.22, 2.23, and 2.24 courtesy of Michel Denber. Fig. 2.25 from Popplestone et al., 1975. Fig. 2.26 courtesy of Production Automation Project, University of Rochester. Fig. 2.27 from Waag and Gramiak, 1976. Fig. 3.1 courtesy of Marty Tenenbaum. Fig. 3.8 after Horn, 1974. Figs. 3.14 and 3.15 after Frei and Chen, 1977. Figs. 3.17 and 3.18 from Zucker, S.W. and R.A. Hummel, "An optimal 3D edge operator," IEEE Trans. PAMI3, May 1981,pp. 324331. Fig. 3.19 curves are based on data in Abdou, 1978. Figs. 3.20, 3.21,and 3.22 from Prager, J.M., "Extracting and labeling boundary segments in natural scenes," IEEE Tans. PAMI 12, 1,January 1980. 1980 IEEE. Figs. 3.23,3.28, 3.29, and 3.30courtesy of Berthold Horn. Figs. 3.24 and 3.26 from Marr, D. and T. Poggio, "Cooperative computation of stereo dis parity," Science, Vol. 194, 1976,pp.283287. 1976bytheAmerican Association for the Advancement of Science. Fig. 3.31 from Woodham, R.J., "Photometric stereo:A reflectance map technique for deter mining surface orientation from image intensity," Proc. SPIE, Vol. 155, August 1978. Figs. 3.33 and 3.34 after Horn and Schunck, 1980. Fig. 3.37 from Tanimoto, S. and T. Pavlidis, "A hierarchical data structure for picture pro cessing," CGIP 4, 2, June 1975, pp. 104119. Fig. 4.6 from Kimme et al., 1975. Figs. 4.7 and 4.16 from Ballard and Sklansky, 1976. Fig. 4.9 courtesy of Dana Ballard and Ken Sloan. Figs.4.12 and 4.13 from Ramer, U., "Extraction of linestructures from photgraphs ofcurved objects," CGIP 4, 2, June 1975, pp. 81103. Fig. 4.14 courtesy of Jim Lester, Tufts/New England Medical Center. Fig. 4.17 from Chien, Y.P. and K.S. Fu, "A decision function method for boundary detec tion," CGIP 3, 2, June 1974, pp. 125140. Fig. 5.3 from Ohlander, R., K. Price, and D.R. Reddy, "Picture segmentation using a recur siveregion splitting method," CGIP 8, 3, December 1979. Fig. 5.4 courtesy of Sam Kapilivsky. Figs. 6.1, 11.16, and A1.13 courtesy of Chris Brown. Fig. 6.3 courtesy of Joe Maleson and John Kender. Fig. 6.4 from Connors, 1979. Texture images by Phil Brodatz, in Brodatz, Textures. New York: Dover, 1966. Fig. 6.9 texture image by Phil Brodatz, in Brodatz, Textures. New York: Dover, 1966. Figs. 6.11, 6.12, and 6.13 from Lu, S.Y. and K.S. Fu, "A syntactic approach to texture analysis," CGIP 7,3, June 1978, pp. 303330. xvi
Acknowledgments

Fig. 6.14 from Jayaramamurthy, S.N., "Multilevel array grammars for generating texture scenes," Proc. PRIP, August 1979,pp.391398. 1979 IEEE. Fig.6.20 from Laws, 1980. Figs. 6.21 and 6.22 from Maleson et al., 1977. Fig. 6.23 courtesy of Joe Maleson. Figs. 7.1 and 7.3 courtesy of Daryl Lawton. Fig. 7.2 after Prager, 1979. Figs.7.4and 7.5 from Clocksin, W.F., "Computer prediction of visual thresholds for surface slant and edge detection from optical flow fields," Ph.D. dissertation, University of Edin burgh, 1980. Fig. 7.7 courtesy of Steve Barnard and Bill Thompson. Figs. 7.8 and 7.9 from Rashid, 1980. Fig. 7.10 courtesy of Joseph O'Rourke. Figs. 7.11 and 7.12 after Aggarwal and Duda, 1975. Fig. 7.13 courtesy of HansHellmut Nagel. Fig. 8.Id after Requicha, 1977. Figs. 8.2, 8.3, 8.21a, 8.22, and 8.26 after Pavlidis, 1977. Figs. 8.10, 8.11, 9.6, and 9.16 courtesy of Uri Shani. Figs. 8.12, 8.13, 8.14, 8.15, and 8.16 from Ballard, 1981. Fig. 8.21 b from Preston, K., Jr., M.J.B. Duff; S. Levialdi, P.E. Norgren, and Ji. Toriwaki, "Basicsofcellular logicwith someapplications in medical image processing," Proc. IEEE, Vol. 67, No. 5, May 1979, pp. 826856. Figs. 8.25, 9.8, 9.9, 9.10, and 11.3 courtesy of Robert Schudy. Fig. 8.29 after Bribiesca and Guzman, 1979. Figs. 9.1, 9.18, 9.19, and 9.27 courtesy of Ari Requicha. Fig. 9.2 from Requicha, A.A.G., "Representations for rigid solids: theory, methods, systems," Computer Surveys 12,4, December 1980. Fig. 9.3 courtesy of Lydia Hrechanyk. Figs. 9.4 and 9.5 after Baumgart, 1972. Fig. 9.7 courtesy of Peter Selfridge. Fig. 9.11 after Requicha, 1980. Figs.9.14and 9.15b from Agin,G.J. and T.O. Binford, "Computer description ofcurved ob jects," IEEE Trans, on Computers 25, 1, April 1976. Fig. 9.15a courtesy of Gerald Agin. Fig. 9.17 courtesy of A. Christensen; published as frontispiece of ACM SIGGRAPH 80 Proceedings. Fig. 9.20 from Marr and Nishihara, 1978. Fig. 9.21 after Tilove, 1980. Fig. 9.22b courtesy of Gene Hartquist. Figs. 9.24, 9.25, and 9.26 from Lee and Requicha, 1980. Figs.9.28a, 9.29, 9.30, 9.31,9.32,9.35, and 9.37 and Table 9.1 from Brown, C. and R. Pop plestone, "Cases inscene analysis," in Pattern Recognition, ed. B.G. Batchelor. New York: Plenum, 1978. Fig.9.28b from Guzman,A.,"Decomposition ofavisualsceneintothreedimensional bodies," in Automatic Interpretation and Classification of Images, A. Grasseli, ed., New York: Academic Press, 1969. Fig. 9.28c from Waltz, D., "Understanding line drawing of scenes with shadows," in The Psychology of Computer Vision,ed. P.H. Winston. New York: McGrawHill, 1975. Fig. 9.28d after Turner, 1974. Figs. 9.33, 9.38, 9.40, 9.42, 9.43, and 9.44 after Mackworth, 1973.

Acknowledgments

xvii

Figs. 9.39, 9.45, 9.46, and 9.47 and Table 9.2 after Kanade, 1978. Figs. 10.2 and A2.1 courtesy of Dana Ballard. Figs. 10.16, 10.17, and 10.18 after Russell, 1979. Fig. 11.5 after Fischler and Elschlager, 1973. Fig. 11.8 after Ambler et al., 1975. Fig. 11.10 from Winston, P.H., "Learning structural descriptions from examples," in The Psychology of Computer Vision, ed. P.H. Winston. New York: McGrawHill, 1975. Fig. 11.11 from Nevatia, 1974. Fig. 11.12 after Nevatia, 1974. Fig. 11.17 after Barrow and Popplestone, 1971. Fig. 11.18 from Davis, L.S., "Shape matching using relaxation techniques," IEEE Trans. PAMI 1, 4, January 1979, pp. 6072. Figs. 12.4 and 12.5 from Sloan and Bajcsy, 1979. Fig. 12.6 after Barrow and Tenenbaum, 1976. Fig. 12.8 after Freuder. 1978. Fig. 12.10from Rosenfeld, A.R., A.Hummel, andS.W.Zucker,"Scenelabelingby relaxation operations," IEEE Trans. SMC 6, 6,June 1976,p.420. Figs. 12.11, 12.12, 12.13, 12.14, and 12.15 after Hinton, 1979. Fig. 13.3 courtesy of Aaron Sloman. Figs. 13.6, 13.7, and 13.8 from Garvey, 1976. Fig. A1.11 after Duda and Hart, 1973. Figs.A2.2and A2.3from Hanson, A.R. and E.M. Riseman, "VISIONS: Acomputer system for interpreting scenes," in Computer VisionSystems, ed.A.R. Hanson and E.M. Riseman. New York: Academic Press, 1978.

Acknowledgments

Mnemonics for Proceedings and Special Collections Cited in the References


CGIP Computer Graphicsand Image Processing COMPSAC IEEEComputer Society's3rd International ComputerSoftware andApplica tions Conference, Chicago, November 1979.

cvs
Hanson, A. R. and E. M. Riseman (Eds.). ComputerVision Systems. New York: Academic Press, 1978. DARPA IU Defense Advanced Research Projects Agency Image Understanding Workshop, Minneapolis, MN, April 1977. Defense Advanced Research Projects Agency Image Understanding Workshop, Palo Alto, CA, October 1977. Defense Advanced Research Projects Agency Image Understanding Workshop, Cambridge, MA, May 1978. Defense Advanced Research Projects Agency Image Understanding Workshop, CarnegieMellon University, Pittsburgh, PA, November 1978. Defense Advanced Research Projects Agency Image Understanding Workshop, University of Maryland, College Park, MD, April 1980. IJCAI 2nd International Joint Conference on Artificial Intelligence, Imperial College, London, September 1971. 4th International Joint Conference onArtificial Intelligence,Tbilisi,Georgia, USSR, September 1975. 5th International Joint Conference on Artificial Intelligence, MIT, Cambridge, MA, August 1977. 6th International Joint Conference on Artificial Intelligence,Tokyo, August 1979.
Mnemonics xix

IJCPR 2nd International Joint Conference on Pattern Recognition, Copenhagen, August 1974. 3rd International Joint Conference on Pattern Recognition, Coronado, CA, November 1976. 4thInternationalJointConferenceonPattern Recognition,Kyoto,November 1978. 5th International Joint Conference on Pattern Recognition, Miami Beach, FL, December 1980. MI4 * Meltzer, B.and D. Michie (Eds.). MachineIntelligence 4. Edinburgh: Edin burgh University Press, 1969. Meltzer, B.and D. Michie (Eds.). MachineIntelligence 5. Edinburgh: Edin burgh University Press, 1970. MI6 Meltzer, B.and D. Michie (Eds.). MachineIntelligence 6. Edinburgh: Edin burgh University Press,1971. M17 Meltzer, B.and D. Michie (Eds.). MachineIntelligence 7.Edinburgh: Edin burgh University Press, 1972. PCV Winston, P. H. (Ed.). The Psychologyof Computer Vision.New York: McGrawHill, 1975. PRIP IEEE Computer Society Conference on Pattern Recognition and Image Processing, Chicago, August 1979.

MI5

Mnemonics

Computer Vision
ComputerVisionIssues
1.1 ACHIEVINGSIMPLEVISION GOALS

SupposethatyouaregivenanaerialphotosuchasthatofFig.1.1aandaskedtolo cateshipsinit.Youmayneverhaveseenanavalvesselinanaerialphotographbe fore, butyouwillhavenotroublepredictinggenerallyhowshipswillappear.You mightreasonthatyouwillfindnoshipsinland,andsoturnyourattentiontoocean areas.Youmightbemomentarilydistractedbytheglareonthewater,butrealizing that itcomesfrom reflected sunlight, you perceive the ocean ascontinuous and flat.Shipsontheopenoceanstandouteasily (ifyouhaveseenshipsfrom theair, youknowtolookfortheirwakes).Neartheshoretheimageismoreconfusing, but youknowthatshipsclosetoshoreareeithermooredordocked.Ifyouhaveamap (Fig.1.1b),itcanhelplocatethedocks (Fig.1.1c);inalowqualityphotograph it canhelpyou identify the shoreline. Thus it might beagoodinvestment ofyour time toestablish the correspondence between the map and the image. Asearch paralleltotheshoreinthedockareasrevealsseveralships(Fig.1.Id). Again, suppose that you are presented with aset ofcomputeraided tomo graphic (CAT) scansshowing "slices"ofthehuman abdomen (Fig.1.2a).These imagesareproductsofhigh technology,andgiveusviewsnot normallyavailable even with xrays.Yourjob isto reconstruct from these crosssections the three dimensional shape of the kidneys. Thisjob may well seem harder thanfinding ships.Youfirstneedtoknowwhattolookfor (Fig.1.2b),wheretofinditinCAT scans,andhowitlooksinsuchscans.Youneedtobeableto"stackup"thescans mentallyandform aninternal modeloftheshapeofthekidneyasrevealed byits slices(Fig.1.2cand1.2d). Thisbookisabout computervision.Thesetwoexampletasksaretypicalcom
1

puter vision tasks; both were solved bycomputers using thesortsofknowledge and techniques alluded tointhedescriptive paragraphs. Computer vision is the enterprise ofautomatingandintegratingawiderangeofprocessesandrepresenta tions used for vision perception. It includes asparts many techniques thatare useful by themselves, such as imageprocessing(transforming, encoding, and transmitting images) andstatisticalpatternclassification (statistical decision theory applied togeneral patterns, visualorotherwise). More importantly forus,it in cludestechniquesforgeometricmodelingandcognitiveprocessing.

1.2 HIGHLEVELAND LOWLEVEL CAPABILITIES

TheexamplesofSection 1.1illustratevisionthatusescognitiveprocesses,geometric models, goals, andplans.These highlevelprocessesareveryimportant;ourexam ples only weakly illustrate their power andscope. There surely would besome overall purpose tofindingships;there mightbecollateral information that there were submarines, barges, orsmall craft intheharbor, andsoforth. CATscans wouldbeusedwithseveraldiagnosticgoalsinmindandanassociated medicalhis tory available. Goals andknowledge are highlevel capabilities that canguide visualactivities,andavisualsystemshouldbeabletotakeadvantageofthem.

(a)

(b)

Fig. 1.1 Finding ships inanaerial photograph, (a)Thephotograph; (b)a corresponding map; (c)thedock areaofthephotograph; (d)registered mapand image,with shiplocation.
2 Ch. 1 Computer Vision

Fig. 1.1 (cont.)

Even such elaborated tasks are very special ones and in their way easier to think about than the commonplace visual perceptions needed to pick up a baby, cross a busy street, or arrive at a party and quickly " s e e " who you know, your host's taste in decor, and how long the festivities have been going on. All these tasks require judgment and large amounts of knowledge of objects in the world, how they look, and how they behave. Such highlevel powers are so well in tegrated into "vision" astobeeffectively inseparable. Knowledge and goals areonly part ofthe visionstory. Vision requires many lowlevelcapabilities we often take for granted; for example, our ability to extract intrinsicimagesof "lightness," "color," and "range." We perceive black as black in a complex scene even when the lighting is such that some black patches are reflecting more light than some white patches. Similarly, perceived colorsare not related simply to the wavelengths of reflected light; if they were, we would con sciously see colors changing with illumination. Stereo fusion (stereopsis) isa low levelfacility basictoshortrange threedimensional perception. An important lowlevel capability is objectperception:for our purposes it does notreallymatter ifthistalent isinnate, ("hardwired"), orifitisdevelopmental or even learned ("compiledin"). Thefact remains that mature biological visionsys tems are specialized and tuned to deal with the relevant objects in their environ
Sec. 1.2 HighLevel and LowLevel Capabilities 3

Fig. 1.2 Findingakidney inacomputeraided lomographicscan, (a)Onesliceofscan data; (b)prototype kidney model; (c)model fitting; (d)resultingkidneyand spinalcordinstances.

ments.Further specialization can often belearned, butitisbuilt onbasicimmut ableassumptionsabouttheworldwhichunderliethevisionsystem. Abasicsortofobject recognition capability isthe "figure/ground" discrimi nation that separates objects from the "background." Other basicorganizational predispositions are revealed by the "Gestalt laws" ofclustering, which demon strate rules our vision systems use to form simple arrays of stimuli into more coherent spatial groups. Adramatic example ofspecialized object perception for

Ch. 7 Computer Vision

human beingsisrevealed inour"face recognition"capability,whichseemstooc cupyalargevolumeofbrainmatter.Geometricvisualillusionsaremoresurprising symptoms ofnonintuitive processingthat isperformed byour visionsystems,ei therforsomedirectpurposeorasasideeffect ofits specializedarchitecture.Some other illusions clearly reflect the intervention of highlevel knowledge. For in stance,thefamiliar "Necker cubereversal"isgrounded inour threedimensional modelsforcubes. Lowlevelprocessingcapabilitiesareelusive;theyareunconscious,andthey are not well connected to other systems that allow direct introspection. For in stance,ourvisualmemoryfor imagesisquiteimpressive,yetourquantitativever bal descriptions of images are relatively primitive. The biological visual "hardware" has been developed, honed,andspecialized overaverylongperiod. However, its organization and functionality isnot well understood except atex treme levelsofdetailandgeneralitythe behavior ofsmallsetsofcatormonkey corticalcellsandthebehaviorofhumanbeingsinpsychophysicalexperiments. Computer vision isthus immediately faced with avery difficult problem;it must reinvent, withgeneral digital hardware, the most basicandyet inaccessible talents ofspecialized, parallel,andpartly analogbiological visual systems.Figure 1.3 maygiveafeelingfor theproblem;itshowstwovisualrenditionsofafamiliar subject.Theinsetisanormalimage,therestisaplotoftheintensities (graylevels) intheimageagainsttheimagecoordinates.Inotherwords,itdisplays information

F
/ " " ' ' . '

<>' ' ' '

? I k
. ; . , ' " ' ' , ' . >
:

. . . \ ^ r : ; ' ; " ''

& f.'.K:i #* '


\

iK v..
' Vs

.'

" " V . .

Fig. 1.3 Tworepresentationsofan image.Oneisdirectlyaccessibletoour lowlevelprocesses;theotherisnot.

Sec. 7.2 HighLevel and LowLevel Capabilities

with "height" instead of "light." No information is lost, and the display is an imagelikeobject, butwedonotimmediatelyseeafaceinit.Theinitialrepresenta tion the computer has to work with is no better; it is typically just an array of numbers from which human beings could extract visual information only very painfully. Skipping the lowlevel processing we take for granted turns normally effortless perceptionintoaverydifficult puzzle. Computer vision is vitally concerned with both lowlevel or "early proc essing" issuesandwith thehighlevel and "cognitive" useofknowledge. Where doesvision leaveoff and reasoning and motivation begin? We donot knowpre cisely, but wefirmly believe (and hope toshow) that powerful, cooperating, rich representations oftheworldareneeded for anyadvanced visionsystem.Without them, nosystem can deriverelevant and invariant information from inputthatis beset witheverchanging lighting and viewpoint, unimportant shape differences, noise,andother largebutirrelevant variations.Theserepresentationscanremove somecomputational loadbypredictingorassumingstructureforthevisualworld. Finally, ifasystem isto be successful in avariety of tasks, it needs some "metalevel"capabilities:itmustbeabletomodelandreasonaboutitsowngoals and capabilities, and the success of its approaches. These complex and related modelsmustbemanipulatedbycognitiveliketechniques,eventhoughintrospec tivelytheperceptualprocessdoesnotalways"feel" touslikecognition.

ComputerVision Systems
1.3 ARANGEOFREPRESENTATIONS

Visualperception istherelationofvisualinputtopreviouslyexistingmodelsofthe world. There is alarge representational gap between the image and the models ("ideas," "concepts") whichexplain,describe,orabstracttheimageinformation. Tobridgethatgap,computervisionsystemsusuallyhavea(looselyordered)range ofrepresentationsconnectingtheinputandthe "output" (afinaldescription,deci sion, orinterpretation). Computer vision then involvesthedesign ofthese inter mediate representations and the implementation ofalgorithms toconstruct them andrelatethemtooneanother. We broadly categorize the representations into four parts (Fig. 1.4) which correspond with the organization ofthis volume. Withineach part there maybe severallayersofrepresentation, orseveralcooperatingrepresentations. Although the setsofrepresentations areloosely ordered from "early" and "lowlevel"sig nalsto "late" and "cognitive'''' symbols, the actualflowofeffort and information between them is not unidirectional. Ofcourse, not all levels need to be used in each computer vision application; some may be skipped, or the processing may startpartwayupthehierarchyorendpartwaydownit. Generalizedimages (PartI) areiconic (imagelike) and analogicalrepresenta tions of the input data. Images may initially arise from several technologies.
6 Ch. 1 Computer Vision

Fig. 1.4 Examplesofthefour categoriesofrep resentation usedincomputervision, (a)Iconic;(b) segmented; (c)geometric;(d)relational.

Domainindependent processing can produce other iconic representations more directly useful to later processing, such as arrays of edgeelements (graylevel discontinuities).Intrinsicimagescansometimesbeproducedatthisleveltheyre veal physical properties ofthe imagedscene (such assurface orientations, range, or surface reflectance). Often parallelprocessingcan produce generalized images. More generally, most "lowlevel" processes can be implemented with parallel computation. Segmentedimages(PartII)areformed from thegeneralizedimagebygather ing its elements into sets likely to be associated with meaningful objects in the scene.Forinstance,segmenting asceneofplanarpolyhedra (blocks) might result in a set of edgesegmentscorresponding to polyhedral edges, or a set of two
Sec. 1.3 A Range ol Representations

Garage

)Bushes

) Grass

j House

j Sky

( j Tree1

( j Tree2

Side2

Fig. 1.4 (cont.)

dimensional regionsintheimagecorresponding topolyhedral faces.In producing thesegmentedimage,knowledgeabouttheparticulardomainatissuebeginstobe important bothtosavecomputationandtoovercomeproblemsofnoiseandinade quatedata.Intheplanarpolyhedralexample,ithelpstoknowbeforehand thatthe linesegmentsmustbestraight. Textureandmotionareknowntobeveryimportant in segmentation, and arecurrently topicsofactive research; knowledge in these areasisdevelopingveryfast. Geometricrepresentations (PartIII) areusedtocapturetheallimportantidea
8 Ch. 7 Computer Vision

of twodimensional and threedimensional shape. Quantifying shape isas impor tantasitisdifficult. Thesegeometricrepresentationsmustbepowerful enoughto support complex and general processing, such as "simulation" of the effects of lighting and motion. Geometric structures areasuseful for encoding previously acquiredknowledgeastheyareforrerepresentingcurrentvisualinput.Computer visionrequiressomebasicmathematics;Appendix 1 hasabriefselectionofuseful techniques. Relationalmodels(PartIV)arecomplexassemblagesofrepresentationsused to support sophisticated highlevel processing. An important tool inknowledge representationissemanticnets, whichcanbeusedsimplyasanorganizational con venience or asaformalism in their own right. Highlevel processing often uses prior knowledge and modelsacquired priortoaperceptualexperience.Thebasic mode ofprocessing turns from constructingrepresentations to matchingthem. At highlevels,propositionalrepresentations becomemoreimportant. Theyaremade upofassertionsthataretrueorfalsewithrespecttoamodel,andaremanipulated by rules of inference.Inferencelike techniques can also be used forplanning, which models situations and actions through time,and thus must reason about temporally varying and hypothetical worlds.The higher the level of representa tion, the more marked isthe flow of control(direction ofattention, allocationof effort) downwardtolowerlevels,andthegreaterthetendencyofalgorithmstoex hibit serialprocessing. These issues of control are basic to complex information processingingeneralandcomputervisioninparticular;Appendix2outlinessome specificcontrolmechanisms. Figure 1.5 illustratesthe looseclassification ofthe four categoriesintoana logicalandpropositional representations.Weconsidergeneralizedandsegmented imagesaswellasgeometricstructurestobeanalogicalmodels.Analogical models capture directly the relevant characteristics of the represented objects, and are manipulatedandinterrogated bysimulationlike processes.Relational modelsare generallyamix of analogical and propositional representations. Wedevelop this distinctioninmoredetailinChapter10.

1.4 THEROLEOF COMPUTERS

Thecomputerisacongenialtoolforresearchintovisualperception. Computers are versatile and forgiving experimental subjects. They are easily andethicallyreconfigurable, notmessy,andtheirworkingscanbescrutinized inthefinestdetail. Computers aredemanding critics.Imprecision, vagueness, andoversightsare nottoleratedinthecomputerimplementationofatheory. Computers offer new metaphors for perceptual psychology (also neurology, linguistics,andphilosophy).Processesandentitiesfromcomputersciencepro vide powerful and influential conceptual tools for thinking about perception andcognition. Computers cangive precise measurements of the amount ofprocessing they
Sec. 1.4 The Role of Computers

Knowledge base

AiJogical models

Analogical prooositional models

Generalized image

Geometric structures

Relational structures

Fig. 1.5 The knowledge base ofacomplex computer vision system, showing four basic representationalcategories.

Table 1.1 EXAMPLES OFIMAGE ANALYSIS TASKS Domain

Objects Threedimensional outdoor scenes indoor scenes Mechanical parts Terrain Buildings,etc.

Modality

Tasks

Knowledge Sources

Robotics

Light Xrays Light Structured light Light Infrared Radar

Identify or describe objects in scene Industrial tasks

Models of objects Models ofthereflection of light from objects

Aerial images

Improved images Resource analyses Weather prediction Spying Missileguidance Tactical analysis Chemical composition Improved images Diagnosis of abnor malities Operative and treatment planning Pathology, cytology Karyotyping Analysis of molecular compositions Determination of spatial orientation Find newparticles Identify tracks

Maps Geometrical models of shapes Models ofimage formation

Astronomy Medical Macro

Stars Planets Body organs

Light Xrays Ultrasound Isotopes Heat Electronmicroscopy Light Electron densities Light Electronmicroscopy Light

Geometrical models of shapes Anatomical models Models ofimage formation

Micro Chemistry Neuroanatomy Physics

Cells Protein chains Chromosomes Molecules Neurons Particle tracks

Models of shape Chemical models Structured models Neural connectivity Atomic physics

do.Acomputerimplementation placesanupperlimitontheamountofcompu tationnecessaryforatask. Computersmaybeusedeithertomimicwhatweunderstandabouthumanper ceptual architectureandprocesses,ortostrikeoutindifferent directionstotry toachievesimilarendsbydifferent means. Computer models may bejudged either by their efficacy for applications and onthejob performance or by their internal organization, processes, and structuresthetheorytheyembody.
1.5 COMPUTER VISION RESEARCHAND APPLICATIONS

"Pure" computer visionresearch often dealswithrelatively domainindependent considerations.Theresultsareuseful inabroadrangeofcontexts.Almostalways suchworkisdemonstrated inoneormoreapplicationsareas,andmoreoften than notaninitialapplicationproblem motivatesconsideration ofthegeneralproblem. Applicationsofcomputervisionareexciting,andtheirnumberisgrowingascom putervisionbecomesbetterunderstood.Table 1.1givesapartiallistof"classical" andcurrentapplicationsareas. Within the organization outlined above, this book presents many specific ideasandtechniqueswithgeneralapplicability.Itismeanttoprovideenoughbasic knowledgeandtoolstosupportattacksonbothapplicationsandresearchtopics.

12

Ch. 7 Computer Vision

GENERALIZED IMAGES

Thefirststep in the vision processisimage formation. Images mayarise from a variety of technologies. For example, most televisionbased systems convert reflected lightintensity intoanelectronicsignalwhich isthendigitized;othersys temsusemoreexoticradiations,such asxrays,laserlight, ultrasound, and heat. Thenetresultisusuallyanarrayofsamplesofsomekindofenergy. Thevision system maybeentirely passive,takingasinput adigitized image from amicrowave or infrared sensor, satellite scanner, oraplanetary probe, but more likelyinvolves somekind ofactive imaging. Automated activeimagingsys tems may control the direction and resolution ofsensors, or regulate and direct their own light sources. The light source itself may have special properties and structuredesignedtorevealthenatureofthethreedimensionalworld;anexample istouseaplaneoflightthatfallsonthesceneinastripewhosestructureisclosely related to the structure ofopaque objects. Range data for the scenemay bepro vided by stereo (two images), but also by triangulation using lightstripe tech niquesorby"spotranging" usinglaserlight.Asinglehardwaredevicemaydeliver range and multispectral reflectivity ("color") information. The imageforming devicemayalsoperform variousotheroperations. Forexample,itmayautomati callysmoothorenhancetheimageorvaryitsresolution. Thegeneralizedimageisasetofrelatedimagelikeentitiesforthescene.This set mayinclude related imagesfrom several modalities, but mayalsoinclude the resultsofsignificant processingthatcanextract intrinsicimages. Anintrinsicimage isan"image," orarray,ofrepresentations ofanimportant physicalquantitysuch assurface orientation, occludingcontours, velocity,orrange.Objectcolor,which is a different entity from sensed redgreenblue wavelengths, is an intrinsic quality.Theseintrinsicphysicalqualitiesareextremely useful; theycanberelated tophysicalobjectsfarmoreeasilythantheoriginalinput values,which reveal the physicalparametersonlyindirectly.Anintrinsicimageisamajorsteptowardscene understandingandusuallyrepresentssignificant andinterestingcomputations.
PartI Generalized Images

Theinformation necessarytocomputeanintrinsicimageiscontained inthe inputimageitself, andisextracted by"inverting" thetransformation wroughtby theimagingprocess,thereflection ofradiationfrom thescene,andotherphysical processes.Anexampleisthefusion oftwostereoimagestoyieldanintrinsicrange image.Many algorithms to recover intrinsic imagescan berealized with parallel implementations, mirroringcomputations thatmaytakeplaceinthelowerneuro logicallevelsofbiologicalimageprocessing. Allofthecomputationslistedabovebenefit from theideaofresolutionpyra mids.Apyramidisageneralizedimagedatastructureconsistingofthesameimage atseveralsuccessively increasinglevelsofresolution.Astheresolution increases, more samplesare required torepresent the increased information and hence the successive levels are larger, making the entire structure look like a pyramid. Pyramidsallowtheintroductionofmanydifferent coarsetofine imageresolution algorithmswhicharevastlymoreefficient thantheirsinglelevel, highresolution onlycounterparts.

Part I Generalized Images

15

Image
Formation 2

2.1 IMAGES

Imageformation occurswhen asensorregisters radiationthat hasinteracted with physicalobjects. Section 2.2 dealswith mathematical modelsofimagesand image formation. Section2.3describesseveralspecificimageformation technologies. Themathematicalmodelofimaginghasseveraldifferent components. 1. Animagefunctionisthefundamental abstractionofanimage. 2. Ageometricalmodeldescribeshowthreedimensionsareprojectedintotwo. 3. A radiometrical modelshows how the imaging geometry, light sources, and reflectancepropertiesofobjectsaffect thelightmeasurementatthesensor. 4. A spatialfrequency model describeshowspatial variationsoftheimagemay becharacterizedinatransform domain. 5. Acolormodeldescribeshowdifferent spectralmeasurementsarerelatedtoim agecolors. 6. Adigitizingmodeldescribestheprocessofobtainingdiscretesamples. This material forms the basis of much imageprocessing work and is developed in much more detail elsewhere, e.g., [Rosenfeld and Kak 1976; Pratt 1978].Ourgoalsarenotthoseofimageprocessing,sowelimitourdiscussiontoa summaryoftheessentials. The wide range of possible sources of samples and the resulting different implications for later processing motivate our overview ofspecific imaging tech niques.Ourgoalisnottoprovideanexhaustivecatalog,butrathertogiveanidea of the range of techniques available. Very different analysis techniques may be needed depending on how the image wasformed. Twoexamples illustrate this

17

point.Iftheimageisformedbyreflectedlightintensity,asinaphotograph,theim age records both light from primary light sources and (more usually) the light reflected off physicalsurfaces. WeshowinChapter 3that incertaincaseswecan use these kinds of images together with knowledge about physics to derive the orientation ofthe surfaces. If, on theotherhand, the imageisacomputed tomo gramofthehuman body (discussed inSection 2.3.4),theimagerepresents tissue densityofinternalorgans.Hereorientationcalculationsareirrelevant, butgeneral segmentation techniques ofChapters 4and 5 (the agglomeration of neighboring samplesofsimilardensityintounitsrepresentingorgans)areappropriate.

2.2 IMAGEMODEL

Sophisticated image models of astatistical flavor are useful in image processing [Jan1981].Hereweareconcernedwithmoregeometricalconsiderations.
2.2.1 Image Functions

Animagefunctionisamathematical representationofanimage. Generally,anim agefunction isavectorvaluedfunction ofasmallnumberofarguments.Aspecial case ofthe imagefunction isthe digital (discrete) imagefunction,where theargu mentstoandvalueofthefunction areallintegers.Different imagefunctions may beusedtorepresent thesameimage,dependingonwhichofitscharacteristicsare important. For instance, a camera produces an image on blackandwhitefilm which is usually thought of asarealvalued function (whose value could be the density ofthephotographic negative) oftworealvalued arguments, onefor each of twospatial dimensions. However, at avery small scale (the order ofthefilm grain)thenegativebasicallyhasonlytwodensities,"opaque"and "transparent." Most images are presented by functions of two spatial variables fix) =fix, y), wherefix, y) isthebrightnessofthegrayleveloftheimageata spatialcoordinate ix,y). Amultispectral imagefisavectorvalued function with components if].. ,f). Onespecialmultispectral imageisacolorimageinwhich, for example, the components measure the brightness values of each of three wavelengths,thatis,

/CO

J red( x ) >Jblue( x ) '/green ( x )

Timevarying images fix,t) have an added temporal argument. For special threedimensional images,x= ix,y, z). Usually,both thedomainandrange of/ arebounded. An important part of the formation process isthe conversion ofthe image representation from acontinuous function to adiscrete function; weneed some wayofdescribingtheimagesassamplesatdiscretepoints.Themathematicaltool weshalluseisthedeltafunction. Formally,thedeltafunction maybedefinedby

18

Ch. 2 Image Formation

8(x)= oo henx =0 W

0whenx^0

( 2 1 )

J8(x)dx= 1
Ifsomecareisexercised, thedeltafunction maybeinterpreted asthelimitofaset offunctions: 8(x) = lim 80c)
n

wb^re

/I
8Ax) 0

ifUI<^

2
(2.2)

otherwise

Auseful propertyofthedeltafunction isthesiftingproperty: j f(x)8(xa)dx =f(a)


oo

(2.3)

Acontinuousimagemaybemultipled byatwodimensional "comb,"orarrayof deltafunctions, toextract afinitenumber ofdiscrete samples (onefor eachdelta function).Thismathematicalmodelofthesamplingprocesswillbeuseful later.
2.2.2 ImagingGeometry

MonocularImaging Pointprojectionisthefundamental modelfor thetransformation wroughtby f a our eye, bycameras,or bynumerous other imagingdevices.Toa irstorder p proximation,thesedevicesactlikeapinholecamerainthattheimageresultsfrom projectingscenepointsthroughasinglepointontoanimageplane(seeFig.2.1).In Fig.2.1,the image plane isbehind the point of projection, and the image isre versed.However,itismoreintuitivetorecomposethegeometrysothatthepoint ofprojection correspondstoaviewpointbehind theimageplane,andtheimageoc cursrightsideup (Fig.2.2).Themathematicsisthesame,butnowtheviewpoint is+ / o n thezaxis,withz=0planebeingtheimageplaneuponwhichthejmageis projected, (f issometimescalledtHefocalleligthinthiscontext. The use o f / i n . this sectionshouldnot beCjQnjjjggd wjth the useof /"for imagefunction.) Asthe imaged object approaches the viewpoint, its projection gets bigger (try moving your hand toward your eye).Tospecify howitsimaged sizechanges,one needs only thegeometry ofsimilar triangles.In Fig.2.2by', theprojected height ofthe object,isrelatedtoitsrealheight^,itspositionz,andthefocal length/by ^ = (2.4)

fz
Sec. 2.2 Image Model

f
19

Fig. 2.1 Ageometric camera model.

Thecaseforx' istreatedsimilarly: xi (2.5) fz f Theprojected imagehasz= 0everywhere.However, projecting awaythezcom ponent is best considered a separate transformation; the projective transform is usuallythoughttodistortthezcomponentjustasitdoesthexand^.Perspectivedis tortionthusmaps(x,y, z)to be',y\ z') =

fx
fz'

fy fz'

h fz

(2.6)

The perspective transformation yieldsorthographicprojectionasaspecialcase whentheviewpointisthepointatinfinityinthezdirection.Thenallobjectsarepro jectedontotheviewingplanewithnodistortionoftheirxand^coordinates. The perspective distortion yields athreedimensional object that has been "pushed out ofshape"; it ismore shrunken the farther itisfrom the viewpoint. The zcomponent is not available directly from atwodimensional image, being identically equal tozero.In our model, however, the distorted zcomponent has information about the distance of imaged points from the viewpoint. When this distorted object isprojected orthographically onto the imageplane,theresult isa perspectivepicture.Thus,toachievetheeffect ofrailroadtracksappearingtocome together in the distance, the perspective distortion transforms the tracksso that they docometogether (atapointatinfinity)!The simpleorthographic projection that projects away the z component unsurprisingly preserves this distortion. Severalpropertiesoftheperspectivetransform areofinterestandareinvestigated further inAppendix1. BinocularImaging Basicbinocular imaging geometry isshown in Fig. 2.3a. For simplicity, we
20 Ch. 2 Image Formation

(*, / ,2

\x',y

-4r^

(b)

Fig. 2.2 (a)Cameramodelequivalent tothatofFig.2.1; (b)definition ofterms.

useasystemwithtwoviewpoints.Inthismodeltheeyesdonotconverge; theyare aimedinparallelatthepointatinfinity inthezdirection.Thedepth information aboutapointisthen encoded only byitsdifferent positions {disparity) inthetwo imageplanes. WiththestereoarrangementofFig.2.3, _ jx d)f fz .... <*+ d)f fz where be',y') and be",y") are the retinal coordinates for theworldpoint imaged
Sec.2.2 Image Model 21

x,

x' =0 x=0 x " =0

^
Image plane

Fig. 2.3 A nonconvergent binocular imagingsystem.

througheacheye.Thebaselineofthebinocularsystemis2d.Thus (/ z)x' = (x d)f (f z)x" = (x +d)f Subtracting (2.7)from (2.8)gives (2.7) (2.8)

(fz)(x"x') or

= 2df

}df , (2.9) x X Thus ifpointscan bematched todetermine thedisparity Cx" x') and thebase lineandfocallengthareknown,thezcoordinateissimpletocalculate. Ifthesystemcanconvergeitsdirectionsofviewtoafinitedistance,conver gence angle may also be used to compute depth. The hardest part of extracting depth information from stereo isthe matchingofpointsfor disparity calculations. "Lightstriping"isawaytomaintaingeometricsimplicityandalsosimplify match ing(Section2.3.3). 2= / 2.2.3 Reflectance Terminology Abasicaspectofthe imagingprocessisthe physicsofthe reflectance ofob jects, whichdetermineshowtheir "brightness" inanimagedependsontheir in herentcharacteristicsandthegeometry oftheimagingsituation.Aclearpresenta tion of the mathematics ofreflectance isgiven in [Horn and Sjoberg 1978;Horn 1977]. Light energy flux ismeasured in watts; "brightness" ismeasured with respecttoareaandsolidangle. TheradiantintensityIofasourceistheexitantflux perunitsolidangle: / = do)
2 2 Ch. 2 Image Formation

watts/steradian

(2.10)

Here dco isan incremental solid angle.The solid angleofasmall area dA measured perpendicular toaradius/isgivenby da> = 4* r inunitsofsteradians. (Thetotalsolidangleofasphere is 4ir.) The irradianceisfluxincident onasurface element dA: (2.11)

E = ^ watts/meter 2 (2.12) dA and the flux exitant from the surface isdefined in terms ofthe radianceL, which is thefluxemitted perunitforeshortened surface area perunitsolid angle: L = watts/(meter 2 steradian) (2.13) dA cos9do> where9istheanglebetween thesurface normaland thedirection ofemission. Image irradiancef is the "brightness" ofthe image at a point, and is propor tional toscene radiance.A"graylevel" isaquantized measurement ofimage irra diance. Image irradiance depends on the reflective properties of the imaged sur faces as well as on the illumination characteristics. How a surface reflects light depends on its microstructure and physical properties. Surfaces may be matte (dull, flat), specular (mirrorlike), or have more complicated reflectivity charac teristics (Section 3.5.1). The reflectancerofasurface isgiven quite generally byits Bidirectional Reflectance Distribution Function (BRDF) [Nicodemusetal.1977]. The BRDF isthe ratio ofreflected radiance in the direction towards the viewer to theirradianceinthedirection towardsasmallareaofthesource. EffectsofGeometryonanImaging System Letusnowanalyzeasimpleimageforming system shown inFig.2.4with the objective ofshowing how the gray levels are related to the radiance of imaged ob jects. Following [Horn and Sjoberg 1978],assume that the imaging device isprop erly focused; rays originating in the infinitesimal area dA0 on the object's surface are projected into some area dAp in the image plane and no rays from other por tionsofthe object's surface reach thisarea ofthe image.The system isassumed to beanidealone,obeying thelawsofsimplegeometrical optics. ,The energy flux/unit area that impinges on thesensor isdefined to beEp. To show how Ep is related to the scene radiance L, first consider the flux arriving at thelensfrom asmallsurface area dA0. From (2.13) thisisgiven as d$> = dA0JLcos9doj (2.14)

Thisfluxisassumed toarriveatanarea dAp intheimaging plane.Hence the irradi anceisgiven by [usingEq. (2.12)] E, = (2.15) dAp Now relate dA0 to dAp by equating the respective solid angles as seen from the lens;thatis [making useofEq. (2.12)],
Sec. 2.2 Image Model 23

Fig. 2.4 Geometryofanimage forming system. , . cosfl , . cos a dA0 r2 = dAp Jo Jp

(2.16)

SubstitutingEqs.(2.16)and(2.14)into(2.15)gives E = cosa fo
fp

JLdc

(2.17)

Theintegralisoverthesolidangleseen bythelens.Inmostinstanceswecanas sume that L isconstant over this angle and hence can be removed from the in tegral.Finally,approximatedoi bytheareaofthelensforeshortened bycosa, that is, (TT/4)D2 cosa dividedbythedistance/ 0 /cosa squared:
doi

= JLD2 cpVa 4 fa2 D_ cos 4 air L fp

(2.18)

sothatfinally

= 4

(2.19)

Theinterestingresultsherearethat (1) theimageirradianceisproportionaltothe sceneradianceL,and (2)thefactorofproportionalityincludesthefourthpowerof the offaxis angle a. Ideally, an imaging device should be calibrated so that the variationinsensitivityasafunction ofa isremoved. 2.2.4 SpatialProperties TheFourierTransform Animageisaspatiallyvaryingfunction.Onewaytoanalyzespatialvariations isthedecomposition ofanimagefunction intoaset oforthogonalfunctions, one such set being the Fourier (sinusoidal) functions. TheFourier transform maybe f usedtotransform theintensityimageintothedomainofspatial requency.Forno
24 Ch.2 ImageFormation

tational convenienceandintuition, weshallgenerally useasanexamplethecon tinuousonedimensionalFouriertransform.Theresultscanreadilybeextendedto thediscretecaseandalsotohigherdimensions [Rosenfeld andKak 1976].Intwo dimensions we shall denote transform domain coordinates by iu, v). The one dimensionalFouriertransform, denoted CS,isdefinedby

3 =[fix)] =Fiu)
where
+ 0 0

Fiu) = ffix)expij2irux)dx
00

(2.20)

wherej =V ( l ) . Intuitively, Fourier analysis expresses afunction asasumof sinewavesofdifferent frequency andphase. TheFouriertransform hasaninverse ~l[F(u)] =fix). Thisinverseisgivenby fix) = f F(u) exp(JITTUX) du
00

(2.21)

Thetransformhasmanyusefulproperties,someofwhicharesummarizedinTable 2.1.CommononedimensionalFouriertransform pairsareshowninTable2.2. The transform Fin) issimply anotherrepresentation oftheimage function. Itsmeaningcanbeunderstood byinterpreting Eq. (2.21) for aspecific valueofx, say x0: fixo) =JV()exp (j2irux0)du (2.22) Thisequationstatesthataparticularpointintheimagecanberepresentedby aweighted sum ofcomplex exponentials (sinusoidal patterns) atdifferent spatial frequencies u.Fiu) isthusaweighting unctionforthedifferent frequencies.Low f spatialfrequencies accountforthe"slowly"varyinggraylevelsinanimage,such as the variation of intensity over a continuous surface. Highfrequency com ponentsareassociated with "quicklyvarying"information, suchasedges.Figure 2.5showstheFouriertransform ofanimageofrectangles,togetherwiththeeffects ofremovinglowandhighfrequency components. The Fourier transform is defined above to be a continuous transform. Althoughitmaybeperformed instantlybyoptics,adiscreteversionofit,the "fast Fourier transform," isalmost universallyusedinimageprocessingandcomputer vision.Thisisbecauseoftherelative versatility ofmanipulating thetransform in thedigitaldomainascomparedtotheopticaldomain.Imageprocessingtexts,e.g., [Pratt1978;GonzalezandWintz1977]discusstheFFTinsomedetail;wecontent ourselveswithanalgorithmforit(Appendix1). TheConvolutionTheorem Convolution is avery important imageprocessing operation, and is a basic operation oflinearsystemstheory. Theconvolution oftwofunctions / andgisa function hofadisplacementydefinedas hiy)=f*g
Sec.2.2 ImageModel

= jfix)giyx)dx

(2.23)
2 5

Table 2.1 PROPERTIES OF THE FOURIER TRANSFORM Spatial Domain

FrequencyDomain Fiu)=(J[fix)] Giu)=(J[gix)}

fix) gix) (1) Linearity C\fix) + c2gix) C\,C2scalars Scaling fiax) (3) (4) (5) (6) Shifting fix Xo) Symmetry Fix) Conjugation

dFiu) + c2Giu)

(2)

\a\

[a]

e2wJxFiu)

/ ( )
F*i~u) x') dx' Fiu)Giu)

f*ix)
Convolution nix) =f*g = J fix')gix
(7)

Differentiation

d"fix)
dx" Parseval's theorem:

i2trju)"Fiu)

jjfix)\2dx =}\F(Z)\2dt
Jfix)g*ix) dx=jFi)G*it) d$ fix)
ReaK/?) Imaginary (I) RE,IO RE,IE RE RO IE 10 Complex even (CE) CO 26

Fit)
Real parteven (RE) Imaginarypartodd (10) RO,IE R I RE 10 IE RO CE CO

Ch.2 ImageFormation

Table 2.2
FOURIER T R A N S F O R M PAIRS f{x)

F(%)

Rectanglefunction 1

Rect(x)

Sinc(g) =

Trianglefunction 1

1 2

1 2

Sine 2 (%)

Gaussian

Unit impulse 8{x)

Unitstep

27T/|

Sec.2.2 Image Model

27

Table 2.2
Comb function 2
n

(cont.)

6(x nx0

(f^

2xQ

urnill
x0 x0 2x cos 2iroj0x

= oe>

t_LLJJ
^2 H i [ 5 ( ? w 0 ) +5(? + w 0 )

sin 27rcjx

5 / [ 8 ( ? w 0 ) +5{f+ w 0 )]

ImF

Intuitively,onefunction is"sweptpast" (inonedimension) or"rubbedover" (in twodimensions) theother.Thevalueoftheconvolutionatanydisplacementisthe integral oftheproduct ofthe (relativelydisplaced) function values.Onecommon phenomenon that iswellexpressed byaconvolution istheformation ofanimage byan optical system. The system (say acamera) has a"pointspread function," whichistheimageofasinglepoint. (Inlinearsystemstheory,thisisthe "impulse response,"orresponsetoadeltafunction input.) Theidealpointspread function is,ofcourse,apoint.Atypicalpointspread function isatwodimensional Gaus sian spatial distribution of intensities, but may include such phenomena as diffraction rings.Inanyevent, ifthecameraismodeled asalinearsystem (ignor

Fig. 2.5 (on facing page) (a) An image,fix, y). (b) A rotated version of (a), filtered toenhance high spatial frequencies, (c) Similar to (b), but filtered to enhance low spatial frequencies, (d), (e), and (f) show the loga rithm of the power spectrum of (a), (b), and (c).The power spectrum is the logsquare modulus of the Fourier transform F(u, v).Considered in polar coordinates (p,0), pointsofsmallp correspond tolowspatial frequencies ("slowlyvarying" intensities), large p to high spatial frequencies contributed by "fast" variations such as step edges.The power at (p, 9) isdetermined bythe amount ofintensity variation at thefrequency p occurring at the angle0.
28

Ch. 2 Image Formation

(b)

(d)

(f)

n H B H o B B H H H H H
29

ing the added complexity that the pointspread function usually varies over the field ofview),theimageistheconvolutionofthepointspreadfunctionandthein putsignal.Thepointspread function isrubbed overtheperfect inputimage,thus blurringit. Convolution isalso agood model for the application of many other linear operators,suchaslinedetecting templates.Itcanbeusedinanotherguise (called correlation) toperform matchingoperations (Chapter3)whichdetectinstancesof subimagesorfeaturesinanimage. Inthespatialdomain, theobviousimplementation oftheconvolutionopera tion involves ashiftmultiplyintegrate operation which is hard to do efficiently. However,multiplicationandconvolutionare"transform pairs,"sothatthecalcu lation of the convolution in one domain (say the spatial) is simplified byfirst Fouriertransforming totheother (thefrequency) domain,performing amultipli cation,andthentransforming back. The convolution of/ a n d gin the spatialdomain isequivalent tothe point wiseproductofFandGinthefrequency domain,
c

Sif*g) =FG

(2.24)

Weshallshowthisinamanner similar to [Dudaand Hart 1973]. Firstweprove theshifttheorem.IftheFouriertransform of/Gc) isFiu), definedas F(u) =Jf(x)
X

exp [ j2ir(ux)]dx

(2.25)

then 5 [fix a)] = ffixa)


X

exp [ j2ir(ux)]dx

(2.26)

changingvariablessothatx' =x aanddx=dx' = ff(x')


x'

exp {j2ir[u(x'

+a)])dx'

(2.27)

Nowexp[jliruix' +a)] = exp (jlrrua) exp( JITTUX'), where thefirst termisaconstant.Thismeansthat 3 [fix a)] =exp( jl7rua)Fiu) Nowwearereadytoshowthat,'y[f(x)*g(x)]=
(

(shift theorem) Fiu)Giu). (2.28) (2.29)

Sif*g) =j{j
y x

fix)giy

x)} exp ( jlituy) dx dy x) exp ( jlTTuy) dy)dx

= ffix){fgiy
x y

Recognizingthatthetermsinbracesrepresent ,fy[giy x)] andapplyingtheshift theorem,weobtain


c

S(f*g) =J/Gc)exp ( j2irux)G(u)


X

dx

(2.30) (2.31)
Ch. 2 Image Formation

= Fiu)Giu)
30

2.2.5 Color

Notallimagesaremonochromatic;infact,applicationsusingmultispectralimages arebecomingincreasingly common (Section2.3.2).Further, human beingsintui tivelyfeelthatcolorisanimportantpartoftheirvisualexperience,andisusefulor evennecessaryfor powerful visualprocessing intherealworld. Colorvisionpro vides ahost ofresearch issues, both forpsychology andcomputer vision. We briefly discuss two aspects ofcolor vision: color spaces andcolor perception. Severalmodelsofthehumanvisualsystemnotonlyincludecolorbuthaveproven usefulinapplications [Granrath1981]. ColorSpaces Colorspacesareawayoforganizingthecolorsperceived byhumanbeings.It happens that weightedcombinations ofstimuliatthree principal wavelengthsare sufficient todefinealmostallthecolorsweperceive.Thesewavelengthsform ana turalbasisorcoordinatesystemfrom whichthecolormeasurement processcanbe described. Color perception isnot related inasimplewaytocolor measurement, however. Colorisaperceptual phenomenon related tohuman response todifferent wavelengthsinthevisibleelectromagneticspectrum [400 (blue) to700nanometers (red); ananometer (nm) is10 9 meter].The sensation ofcolor arises fromthe sensitivities ofthree typesofneurochemical sensors inthe retina tothe visible spectrum. The relative responseofthese sensorsisshown inFig.2.6.Note that each sensor respondstoarangeofwavelengths. The illumination source hasits own spectral compositionf{k) which ismodified bythe reflecting surface.Let r(k) bethisreflectance function.ThenthemeasurementRproducedbythe "red" sensorisgivenby R=jf(k)r(k)hR(k) dk (2.32)

So thesensor output is actually theintegral of three different wavelength dependentcomponents:thesource/ , thesurfacereflectance /,andthesensorhR. Surprisingly, onlyweightedcombinationsofthreedeltafunction approxima tionstothedifferentf(k)h 0 0 , thatis,8(A/?), 8(A G ),and8(X B ),arenecessaryto
i

a 400 ^ / 500 Wavelength,nm 600 Sec.2.2 Image Model

700

Fig. 2.6 Spectral responseof human color sensors.


31

producethesensationofnearlyallthecolors.Thisresultisdisplayedonachromati citydiagram.Suchadiagramisobtainedbyfirstnormalizingthethreesensormeas urements: r = R + R G+ B G (2.33) R + G+ B B b = R + G +B andthenplottingperceived colorasafunction ofanytwo(usuallyredandgreen). Chromaticity explicitly ignoresintensity orbrightness;itisasection through the threedimensionalcolorspace(Fig.2.7).Thechoiceof(XR,kG, \B) = (410,530, 650)nm maximizes the realizable colors, but somecolors still cannot berealized sincetheywouldrequirenegativevaluesforsomeof/, g,andb. Another moreintuitivewayofvisualizingthepossiblecolorsfrom the RGB spaceistoviewthesemeasurementsasEuclideancoordinates.Hereanycolorcan bevisualized asapoint inthe unit cube.Other coordinate systems areuseful for different applications;computergraphicshasprovedastrongstimulusforinvesti gationofdifferent colorspacebases. ColorPerception Color perception is complex, but the essential step is a transformation of threeinputintensitymeasurements intoanotherbasis.Thecoordinatesofthenew

(a)

(b)

Fig. 2.7 (a) An artist's conception of the chromaticity diagramsee color insert; (b) a more useful depiction. Spectral colors range along the curved boundary; the straight boun dary isthelineofpurples.
32 Ch. 2 Image Formation

basisaremoredirectlyrelatedtohumancolorjudgments. Although theRGBbasisisgoodfor theacquisitionordisplayofcolor infor mation, itisnotaparticularlygoodbasistoexplain theperception ofcolors. Hu man vision systems can make goodjudgments about the relative surface reflec tancer(A)despitedifferent illuminatingwavelengths;thisreflectanceseemstobe whatwemeanbysurfacecolor. Another important feature ofthecolorbasisisrevealed byanabilitytoper ceive in "black and white," effectively deriving intensity information from the color measurements. From an evolutionary point of view, wemight expect that colorperceptioninanimalswouldbecompatiblewithpreexistingnoncolorpercep tualmechanisms. These twoneedsthe need tomakegood colorjudgments and theneed to retainanduseintensity informationimply thatweuseatransformed, nonRGB basisforcolorspace.Ofthedifferent basesinuseforcolorvision,allarevariations onthistheme:Intensityforms onedimensionandcolorisatwodimensionalsub space.Thedifferences ariseinhowthecolorsubspaceisdescribed.Wecategorize suchbasesintotwogroups. 1.Intensity/Saturation/Hue(IHS).Inthisbasis,wecomputeintensityas intensity:= R + G+B (2.34) The saturation measures the lackofwhiteness in the color. Colorssuch as "fire engine" redand "grass"greenaresaturated; pastels (e.g.,pinksand paleblues) aredesaturated.SaturationcanbecomputedfromRGBcoordinatesbytheformula [TenenbaumandWeyl1975] 3min (R, G, B) ,~~cx ^L (2.35) : intensity Hue is roughly proportional to the average wavelength of the color. It can be definedusingRGBbythefollowingprogram fragment: MR G) +(R B))} (2.36) hue:= cos l ^VCR G)2+ (R B){G BY UB > Gthenhue:= 2pi hue The IHS basis transforms the RGB basis in the following way. Thinking of the color cube,the diagonal from the origin to (1, 1, 1)becomes the intensity axis. Saturation isthedistanceofapointfrom thataxisandhueistheanglewithregard tothepointaboutthataxisfromsomereference (Fig.2.8). This basis isessentially that used byartists [Munsell 1939],who term sat uration chroma. Also, this basis hasbeen used ingraphics [Smith 1978;Joblove andGreenberg1978]. One problem with the IHS basis, particularly asdefined by (2.34) through (2.36),isthatitcontainsessentialsingularitieswhereitisimpossibletodefine the color in aconsistent manner [Kender 1976].For example, hue hasan essential singularityforallvaluesof(R, G,B), whereR = G=B. Thismeansthatspecial caremustbetakeninalgorithmsthatusehue. 2.Opponentprocesses.Theopponent processbasisusesCartesianratherthan
Sec. 2.2 Image Model 33

, saturation:= 1

(a)

(b)

8 An IHSColorSpace, (a)Crosssection atone intensity; (b) crosssectionatone hueseecolorinserts.

cylindrical coordinates for the color subspace, and wasfirstproposed by Hering [Teevan and Birney 1961].The simplest form ofbasis isalinear transformation from R, G, B coordinates. The new coordinates are termed "i? G", "Bl r " , a n d " ^ Bk ": \R G Bl Y W Bk 1 1 1 2 1 1 1 R] 2 G 1 B

Theadvocatesofthisrepresentation, suchas[HurvichandJameson 1957],theor izethat thisbasishasneurological correlatesand isinfact thewayhuman beings represent ("name") colors.Forexample,inthisbasisitmakessensetotalkabout a "reddish blue" but not a"reddish green." Practical opponent process models usuallyhavemorecomplexweightsinthetransform matrixtoaccountforpsycho physical data. Somestartling experiments [Land 1977] show our ability to make correct colorjudgmentsevenwhen theillumination consistsofonlytwoprincipal wavelengths. The opponent process, at the level at which wehave developed it, doesnotdemonstrate howsuchjudgmentsaremade,butdoesshowhowstimulus atonlytwowavelengthswillproject intothecolorsubspace.Readersinterestedin thedetailsofthetheoryshouldconsultthereferences. Commercial television transmissionneedsanintensity,or" W Bk" com ponent forblackandwhitetelevisionsetswhilestillspanningthecolorspace.The National Television Systems Committee (NTSC) uses a "YIQ" basis extracted fromRGBvia

Ch. 2 Image Formation

0.60 0.28 0.32 0.21 0.52 0.31 0.30 0.59 0.11 Thisbasisisaweightedformof (/, Q, Y) = ("R yan, " "magentagreen, " "WBk") c
2.2.6 Digital Images

The digitalimageswith which computer vision dealsare represented bymvector discretevalued imagefunctions / ( x ) , usually ofone,two,three, orfour dimen sions. Usually m = 1, and both the domain and range offix) are discrete. The domain of / is finite, usually a rectangle, and the range of / is positive and bounded:0< fix) ^ Mfor someintegerM.Forallpracticalpurposes,theimage isacontinuousfunction whichisrepresentedbymeasurementsorsamplesatregu larly spaced intervals. At the time the image issampled, the intensity isusually quantizedintoanumberofdifferent graylevels. Foradiscreteimage,fix) isanin tegergraylevel,andx = (x,y)isapairofintegercoordinatesrepresentingasam ple point in a twodimensional image plane. Sampling involves two important choices:(1) thesamplinginterval, whichdeterminesinabasicwaywhether allthe information intheimageisrepresented,and (2)thetesselationorspatialpatternof samplepoints,whichaffects importantnotionsofconnectivityanddistance.Inour presentation, we first show qualitatively the effects of sampling and graylevel quantization.Second,wediscussthesimplestkindsoftesselationsoftheplane.Fi nally,andmostimportant,wedescribethesamplingtheorem,whichspecifieshow closetheimagesamplesmustbetorepresenttheimageunambiguously. Thechoiceofintegerstorepresentthegraylevelsandcoordinatesisdictated by limitations in sensing. Also, of course, there are hardware limitations in representing imagesarising from their sheersize.Table2.3showsthestoragere quiredforanimagein8bitbytesasafunction ofm,thenumber ofbitspersam ple,andN,thelineardimensionofasquareimage. Forreasonsofeconomy (andothersdiscussedinChapter3)weoften useim agesofconsiderablylessspatialresolutionthanthatrequiredtopreservefidelityto thehumanviewer.Figure2.9providesaqualitativeideaofimagedegradationwith decreasingspatialresolution. Asshown inTable 2.3,another waytosavespacebesidesusinglessspatial resolution istousefewer bits pergraylevelsample. Figure2.10showsan image represented with different numbers of bits per sample.Onestriking effect isthe "contouring" introduced withsmallnumbersofgraylevels.Thisis,ingeneral,a problem for computer vision algorithms, which cannot easily discount the false contours.The choice of spatial and graylevel resolution for any particular com putervisiontaskisanimportantonewhichdependsonmanyfactors.Itistypicalin

Sec.2.2 Image Mode!

35

Fig. 2.9 Usingdifferent numbers ofsamples, (a)A'= 16;(b) N = 32; (c) A' = 64; (d) N= 128;(e) A'= 256;(f) N= 512.

Ch.2 Image Formatio

Table 2.3 NUMBER OF 8BIT BYTES OF STORAGE FOR VARIOUS VALUES OF N AND M

N m

32

64

128
2,048 4,096 8,192 8,192 16,384 16,384 16,384 16,384

256
8,192 16,384 32,768 32,768 65,536 65,536 65,536 65,536

512
32,768 65,536 131,072 131,072 262,144 262,144 262,144 262,144

1 2 3 4 5 6 7 8

128 512 256 1,024 512 2,048 512 2,048


1,024 1,024 1,024 1,024 4,096 4,096 4,096 4,096

computer vision tohaveto balancethedesire for increased resolution (bothgray scaleandspatial) against itscost. Betterdatacanoften makealgorithmseasier to write, but asmallamount ofdatacanmakeprocessing more efficient. Ofcourse, the image domain, choice of algorithms, and image characteristics all heavily influencethechoiceofresolutions. TesselationsandDistanceMetrics Althoughthespatialsamplesfor/(x) canberepresentedaspoints,itismore satisfying totheintuition andacloserapproximation totheacquisition processto think ofthese samplesasfinitesizedcells ofconstant graylevel partitioning the image.Thesecellsaretermedpixels,anacronymforpictureelements. The pattern intowhich the planeisdivided iscalled its tesselation. The mostcommon regular tesselationsoftheplaneareshowninFig.2.11. Although rectangular tesselations are almost universally used in computer vision, they have a structural problem known as the "connectivity paradox." Givenapixelinarectangulartesselation,howshouldwedefinethepixelstowhich it is connected? Two common ways arefourconnectivity and eightconnectivity, showninFig.2.12. However,eachoftheseschemeshascomplications.ConsiderFig.2.12c,con sisting of a black object with a hole on a white background. If we use four connectedness, the figure consists of four disconnected pieces, yet the hole is separated from the "outside" background. Alternatively, if we use eight connectedness,thefigureisoneconnected piece,yettheholeisnowconnectedto theoutside.Thisparadoxposescomplicationsformanygeometricalgorithms.Tri angular and hexagonal tesselations donot suffer from connectivity difficulties (if weusethreeconnectedness for triangles);however, distancecanbemore difficult tocomputeonthesearraysthanforrectangulararrays. Thedistancebetween twopixelsinanimageisanimportant measurethatis fundamental tomanyalgorithms.Ingeneral,adistanced\sametric.Thatis,
Sec. 2.2 Image Model 37

Fig. 2.10 Using different numbers ofbitsper sample, (a) m = 1; (b) m=2; (c) m = 4; (d) m= 8.

(1) d(x, y)=0 i f f x = y (2) d(x, y)=d(y,x) (3) d(x, y)+ /(y, z) > d(\, z) Forsquarearrayswithunitspacingbetweenpixels,wecanuseanyofthefollowing commondistancemetrics (Fig.2.13)fortwopixelsx = (x\,yi) and>>=(^J^) Euclidean: de(x, y) V ( x , x 2 ) 2 + Cityblock:
dCb(*>y ) = |jfi*2l'+ \y\y2\ 38

(y]y2)2

(2.37) (2.38)
Ch.2 ImageFormation

(a)

(c)

Fig. 2.11 Different tesselations ofthe image plane, (a) Rectangular; (b) triangular; (c) hexagonal.

Chessboard: dch(x,y) =max \x\x2l\y\y2\ (2.39)

Other definitions are possible, and all such measures extend to multiple dimen sions.Thetesselationofhigherdimensionalspaceintopixelsusuallyisconfined to (/7dimensional) cubicalpixels. TheSamplingTheorem Consider the onedimensional "image" shown inFig.2.14.Todigitize this imageonemustsampletheimagefunction. Thesesampleswillusuallybeseparat edat regular intervals asshown. Howfar apart should thesesamples betoallow reconstruction (toagiven accuracy) ofthe underlyingcontinuous imagefrom its samples?Thisquestion isanswered bytheShannonsamplingtheorem.Anexcel lent rigorous presentation ofthe sampling theorem may befound in [Rosenfeld and Kak 1976].Hereweshallpresent ashortergraphical interpretation using the resultsofTable2.2.Forsimplicityweconsider theimagetobeperiodicinorderto avoid small edgeeffects introduced bythefiniteimagedomain. Amore rigorous
Sec.2.2 Image Model

39

wm. f 1
^ /// ^
(b)

^ ^ ^ ^

i
m
^ ^ 7
(0

z^ ^

Fig. 2.12 Connectivity paradox for rectangular tesselations. (a) Acentral pixel and its4connected neighbors; (b) a pixel and its 8connected neighbors; (c)a figure withambiguousconnectivity.

232 3 2 2 2 3 2 2 1 1 1 2 2 3 2 1 0 1 2 3 2 2 1 1 1 2 2 3 2 2 2 3 232

3 323 32 123 3 2 1 0 1 2 3 3 2 1 2 3 323 3


(b)

3 3 3 3 3 3 3 3 2 2 2 2 2 3 3 2 1 1 1 2 3 3 2 1 0 1 2 3 3 2 1 1 1 2 3 3 2 2 2 2 2 3 3 3 3 3 3 3 3 (0

Fig. 2.13 Equidistant contours fordif ferent metrics.

Fig. 2.14 Onedimensionalimageanditssamples.

treatment, which considers these effects, is given in [Andrews and Hunt 1977]. Supposethattheimageissampledwitha"comb"function ofspacingxQ(see Table2.2).Thenthesampledimagecanbemodeledby fAx) =f(x)^d(x nx0)
(2.40)

wheretheimagefunction modulatesthecombfunction. Equivalently, thiscanbe writtenas fs(x) = f(nx0)8(x nx0) (2.41)

Therighthand sideofEq. (2.40) istheproduct oftwofunctions, sothat property


40 Ch. 2 Image Formation

(6) inTable2.1isappropriate.TheFouriertransform offs(x) isequaltothecon volutionofthetransformsofeachofthetwofunctions. Usingthisresultyields Fsiu) =F(u)*T8(w )


* o *o

(2.42)

ButfromEq.(2.3), Fiu) *8(w ~ ) = F(u )


X0
XQ

(2.43)

sothat Fs(u) =^Fiu ^) (2.44) ^o * 0 Therefore, sampling theimagefunctionfix) atintervalsofxoisequivalent inthefrequency domain toreplicatingthetransform o f / a t intervalsof . This
XQ

limitstherecoveryoffix) from itssampled representation,fs(x). Therearetwo basicsituationstoconsider.Ifthe transform offix) isbandlimitedsuchthat Fiu) = 0for|u|> l/(2x 0 ), thenthereisnooverlapbetweensuccessivereplicationsof Fiu) inthefrequency domain. ThisisshownforthecaseofFig.2.15a,wherewe havearbitrarilyusedatriangularshaped imagetransform toillustratetheeffectsof sampling.Incidentally, notethatfor thistransform Fiu) =F(u) andthatithas no imaginary part; from Table 2.2, the onedimensional imagemust also bereal andeven.NowifF{u) isnotbandlimited, i.e.,thereareu> forwhichFiu) 2*o ^ 0,thencomponentsofdifferent replicationsofFiu) willinteracttoproducethe composite function Fsiu), asshowninFig. 2.15b.Inthefirstcasefix) canbe recoveredfrom Fsiu) bymultiplyingFsiu) byasuitableGiu): G(u) = Then fix)=<rl[Fsiu)Giu)] (2.46) 1 0 lw|< '
2x0 (2.45)

otherwise

However,inthesecondcase,F5iu)Giu) isverydifferent from theoriginalFiu). ThisisshowninFig.2.15c.SamplingaFiu)that isnot bandlimited allows infor mation athigh spatial frequencies tointerfere with that atlowfrequencies, a phenomenonknownasaliasing. Thusthesamplingtheorem hasthisveryimportantresult:Aslongastheim agecontainsnospatialfrequencies greater than onehalf thesampling frequency, the underlying continuous image isunambiguously represented byitssamples. However, lestonebetemptedtoinsistonimagesthathavebeensosampled, note thatitmaybeuseful tosampleatlowerfrequencies thanwouldberequiredforto talreconstruction. Suchsampling isusuallypreceded bysomeform ofblurringof
Sec. 2.2 Image Model 41

Piu)

7x,

1 2x n

PAu)

y\
\ /
* 0

\ /
1 2x n

(b)

PAu)G(u) >

PAu)G(u)

(0 Fig. 2.15 (a) Fiu) bandlimited so that F(u) = 0 for\u\> limited asin (a),(c)reconstructed transform. \/2x0. (b) Fiu) not band

theimage,orcanbeincorporated withsuchblurring (byintegratingtheimagein tensity overafiniteareaforeach sample).Imageblurringcanbury irrelevantde tails,reducecertainformsofnoise,andalsoreducetheeffectsofaliasing.

2.3 IMAGING DEVICES FOR COMPUTER VISION

Thereisavastarrayofmethodsforobtainingadigitalimageinacomputer.Inthis section wehaveinmind only "traditional" imagesproduced byvariousformsof radiationimpingingonasensorafterhavingbeenaffected byphysicalobjects. Manysensorsarebestmodeledasananalogdevicewhoseresponsemustbe digitizedforcomputer representation. Thetypesofimaging devices possibleare limited onlybythetechnical ingenuityoftheirdevelopers;attemptingadefinitive
42 Ch. 2 Image Formation

XRay scanner

Fig. 2.16 Imagingdevices (boxes), information structures (rectangles),and processes (circles).

taxonomy isprobably unwise. Figure 2.16 isaflowchart ofdevices, information structures,andprocessesaddressedinthisandsucceedingsections. Whentheimagealreadyexistsinsomeform,orphysicalconsiderationslimit choiceofimagingtechnology,thechoiceofdigitizingtechnologymaystillbeopen. Mostimagesarecarriedonapermanentmedium,suchasfilm,oratleastareavail able in (essentially) analog form to adigitizing device. Generally, the relevant technical characteristics of imaging or digitizing devices should be foremost in mind when atechnique is being selected. Such considerations as the signalto noise ratio ofthe device, its resolution, the speed at which itworks, and itsex penseareimportantissues.
Sec.2.3 Imaging Devices for Computer Vision 43

2.3.1 Photographic Imaging

The camera is the most familiar producer of optical images on a permanent medium. Weshallnotaddressherethemultitudesofstillandmoviecameraop tions;rather, webriefly treatthecharacteristicsofthephotographicfilmandofthe digitizingdevicesthatconverttheimagetomachinereadableform.Moreonthese topicsiswellpresentedinthe References. Photographic (blackandwhite) film consistsofanemulsionofsilver halide f b crystalsona ilm ase.(Severalotherlayersareidentifiable,butarenotessentialto an understanding of therelevant properties offilm.)Upon exposure tolight, the silver halide crystals form developmentcenters, which are smallgrainsof metallic silver. The photographic development process extends the formation of metallic silver totheentire silverhalidecrystal,which thus becomesabinary ("light" or "no light") detector. Subsequent processing removes undeveloped silver halide. The resultingfilmnegativeisdark where many crystalsweredeveloped and light wherefewwere.Theresolution ofthefilmisdetermined bythegrainsize,which depends on the original halide crystals and on development techniques. Gen erally,thefasterthefilm(thelesslightneeded toexposeit),thecoarserthegrain. Filmexiststhatissensitivetoinfrared radiation;xrayfilmtypicallyhastwoemul sionlayers,givingitmoregraylevelrangethanthatofnormalfilm. Arepetitionofthenegativeforming processisusedtoobtainaphotographic print. The negative isprojected onto photographicpaper, which respondsroughly inthesamewayasthenegative.Most photographicprintpapercannotcapturein oneprinttherangeofdensitiesthatcanbepresentinanegative. Positivefilmsdo existthatdonotrequireprinting;themostcommonexampleiscolorslide film. Theresponseoffilmtolightisnotcompletely linear.Thephotographicden sityobtained byanegative isdefined asthe logarithm (base 10)oftheratioofin cidentlighttotransmittedlight. D = log10 f The exposure of a negative dictates (approximately) its response. Exposure is defined astheenergy per unitarea thatexposed thefilm(initssensitive spectral range).Thusexposureistheproductoftheintensityandthetimeofexposure.This mathematical model of the behavior of the photographic exposure process is correct for awide operating rangeofthefilm,but reciprocityfailureeffects in the filmkeeponefrom beingablealwaystotradelightlevelforexposuretime.Atvery lowlightlevels,longerexposuretimesareneeded thanarepredicted bytheprod uctrule. The response offilmtolight isusuallyplotted inan"H&D curve" (named forHurterandDriffield), whichplotsdensityversusexposure.TheH&Dcurveof filmdisplays many of its important characteristics. Figure 2.17 exhibits atypical H&Dcurveforablackandwhite film. The toeofthecurveisthelowerregionoflowslope.Itexpressesreciprocity failure and the fact that thefilmhasacertain bias,orfog response, which dom inatesitsbehavioratthelowestexposurelevels.Asonewouldexpect,thereisan upper limittothedensityofthefilm,attained whenamaximum number ofsilver
Ch. 2 Image Formation

2.0

1.0

nn
Log(exposure) Fig. 2.17 TypicalH & D curve.

halide crystals are rendered developable. Increasing exposure beyond this max imum levelhaslittleeffect, accountingfor the shoulderinthe H&Dcurve,orits flattened upperend. Inbetweenthetoeandshoulder, thereistypicallyalinearoperatingregionof the curve. Highcontrast films are those with high slope (traditionally called gamma); theyresponddramatically tosmallchangesinexposure.Ahighcontrast filmmayhaveagammabetweenabout 1.5and 10.Filmswithgammasofapproxi mately 10areused in graphicsartstocopy linedrawings.Generalpurposefilms havegammasofabout0.5to 1.0. The resolution ofageneralfilm isabout 40lines/mm, which means thata 1400x 1400imagemaybedigitizedfrom a35mmslide.Atanygreater sampling frequency, theindividualfilmgrainswilloccupymorethanapixel,andtheresolu tionwillthusbegrainlimited. ImageDigitizers(Scanners) Accuracy and speedarethe main considerations inconverting animageon filmintodigitalform.Accuracyhastwoaspects:spatialresolution,looselythelevel ofimagespatialdetailtowhichthedigitizercanrespond,andgraylevelresolution, defined generally as the range of densities or reflectances towhich the digitizer respondsandhowfinelyitdividestherange. Speedisalsoimportantbecauseusu allymanydataareinvolved;imagesof1 millionsamplesarecommonplace. Digitizers broadly take two forms: mechanical and "flying spot." In a mechanicaldigitizer,thefilmandasensingassemblyaremechanicallytransported pastone another whilereadingsaremade.In aflyingspotdigitizer, thefilmand sensorarestatic.Whatmovesisthe"flying spot,"whichisapointoflightonthe face ofacathoderay tube, oralaser beam directed bymirrors.Inalldigitizersa verynarrowbeamoflightisdirectedthroughthefilmorontotheprintataknown coordinatepoint.Thelighttransmittanceorreflectance ismeasured, transformed from analogtodigitalform, andmadeavailabletothecomputer through interfac ingelectronics.Thelocationonthemediumwheredensityisbeingmeasuredmay alsobetransmittedwitheachreading,butitisusuallydeterminedbyrelativeoffset from positions transmitted lessfrequently. For example, a"new scan line" im pulseistransmittedforTVoutput; thepositionalongthecurrentscanlineyields anxposition,andthenumberofscanlinesyieldsavposition.
Sec.2.3 Imaging Devices for Computer Vision 4 5

Themechanicalscannersaremostlyoftwotypes, latbed nddrum.Inaflat f a beddigitizer, thefilmislaidflatonasurface overwhich the lightsourceand the sensor (usually a very accurate photoelectric cell) are transported in a raster fashion. Inadrumdigitizer, thefilmisfastened toacirculardrumwhichrevolves asthesensorandlightsourcearetransporteddownthedrumparalleltoitsaxisof rotation. Color mechanical digitizers also exist; they work by using colored filters, effectively extracting inthreescansthree "color overlays"whichwhen superim posed would yield the original color image. Extracting some "composite" color signalwithonereadingpresentstechnicalproblemsandwouldbedifficult todoas accurately. SatelliteImagery LANDSATandERTS (EarthResourcesTechnology Satellites) havesimilar scannerswhichproduceimagesof2340x33807bitpixelsinfour spectralbands, coveringanareaofi00 x 100nauticalmiles.Thescannerismechanical,scanning six horizontal scan lines at a time; the rotation of the earth accounts for the advancementofthescanintheverticaldirection. Asetoffour imagesisshowninFig.2.18.Thefour spectral bandsarenum bered 4, 5,6,and 7.Band4 [0.5to0.6/xm (green)] accentuates sedimentladen waterandshallowwater,band5[0.6to0.7/xm (red)]emphasizescultural features suchasroadsandcities,band6[0.7to0.8fxm (nearinfrared)] emphasizesvegeta tion and accentuates thecontrast between landandwater, band 7 [0.8to 1.1/xm (near infrared)] islike band 6except that it is better at penetrating atmospheric haze. TheLANDSAT imagesareavailableatnominal costfrom the U.S.govern ment (The EROSData Center, Sioux Falls,South Dakota 57198).They are fur nished on tape, and cover the entire surface of the earth (often the buyer hasa choiceoftheamountofcloudcover).Theseimagesform ahugedatabaseofmul tispectral imagery, useful for landuse andgeologicalstudies;they furnish some thingofanimageanalysischallenge,sinceonesatellitecanproducesome6billion bitsofimagedataperday. TelevisionImaging Televisioncamerasareappealingdevicesforcomputervisionapplicationsfor several reasons. For one thing, the image is immediate; the camera can show eventsastheyhappen.Foranother, theimageisalreadyinelectrical,ifnotdigital form. "Television camera" is basically a nontechnical term, because many different technologiesproducevideosignalsconformingtothestandardssetbythe FCCandNTSC.Camerasexistwithawidevarietyoftechnicalspecifications. Usually, TVcamerashaveassociated electronics which scananentire "pic ture" atatime.Thisoperation isclosely related tobroadcast and receiver stand ards, and ismoreoriented tohuman viewing than tocomputer vision.An entire image (ofsome525scanlinesintheUnitedStates) iscalledaframe, andconsists oftwofields,eachmadeupofalternatescanlinesfrom theframe.Thesefieldsare generatedandtransmitted sequentiallybythecameraelectronics.Thetransmitted imageisthusinterlaced, withalloddnumbered scanlinesbeing"painted" onthe
46 Ch.2 Image Formation

(c)

(d)

Fig. 2.18 ThestraitsofJuandeFucaasseenbytheLANDSATmultispectralscanner, (a) Band4;(b)band5;(c)band6;(d)band7.

screen alternating with allevennumbered scan lines.In the United States, each fieldtakes]4osectoscan,soawholeframe isscannedevery/30sec.Theinterlacing islargelytopreventflickering oftheimage,whichwouldbecomenoticeableifthe framewerepaintedfrom toptobottomonlyonceinJ60sec. Theseautomaticscan ningelectronics may bereplaced oroverridden in many cameras,allowing "ran domaccess"totheimage.Insometechnologies,suchastheimagedissector, the longer thesignaliscollectedfrom anylocation, thebetter thesignaltonoise per formance. Thereareanumberofdifferent systemsusedtogeneratetelevisionimages. Wediscussfivemainmethodsbelow. Imageorthicontube. Thisisoneofthetwomainmethodsinusetoday (in additiontothevidicon).Itoffersverystableperformanceatallincidentlightlevels
Sec.2.3 Imaging Devices for Computer Vision 47

and is widely used in commercial television. It is a storagetype tube, since it dependsontheneutralizationofpositivechargesbyascanningelectronbeam. The image orthicon (Fig.2.19) isdivided into animaging and readout sec tion.Intheimagingsection,lightfromthesceneisfocusedontoasemitransparent photocathode.Thisphotocathodeoperatesthesamewayasthecathodeinaphoto tube. It emits electrons which are magnetically focused by a coil and are acceleratedtowardapositivelychargedtarget.Thetargetisathinglassdiskwitha finewiremeshscreen facing the photocathode. When electrons strike it, secon daryemissionfrom theglasstakesplace.Aselectronsareemittedfrom thephoto cathode side of the disk, positive charges build up on the scanning side. These chargescorrespondtothepatternoflightintensityinthescenebeingviewed. Inthereadoutsection,thebackofthetargetisscannedbyalowvelocityelec tronbeamfrom anelectrongunattherearofthetube.Electronsinthisbeamare absorbedbythetargetinvaryingamounts,dependingonthechargeonthetarget. The image is represented by the amplitudemodulated intensity of the returned beam. Vidicontube. The vidicon issmaller, lighter, and more rugged than the imageorthicon,makingitidealforportableuse. Herethetarget (theinner surface ofthefaceplate) iscoatedwithatransparentconductingfilmwhichformsavideo signal electrode (Fig.2.20). Athin photosensitive layer isdeposited on thefilm, consisting ofalargenumber oftinyresistiveglobuleswhoseresistance decreases on illumination. Thislayer isscanned inraster fashion byalowvelocity electron beamfromtheelectrongunattherearofthetube.Thebeamdepositselectronson thelayer,thusreducingitssurfacepotential.Thetwosurfacesofthetargetessen tiallyform acapacitor, andthescanningaction ofthebeamproducesacapacitive currentatthevideosignalelectrodewhichrepresentsthevideosignal. Theplumbiconisessentiallyavidiconwithaleadoxidephotosensitivelayer. It offers thefollowing advantagesoverthevidicon:higher sensitivity, lowerdark current,andnegligiblepersistenceorlag.

Focusing coil

Horizontal and vertical deflection coils Alignment coil I _ Aperture disk Electron multiplier Ele. "< 3 /

**>

Grid : Dynode 1 Cathode

Image section

Multiplier section!

Fig. 2.19 Theimageorthicon.


48 Ch.2 Image Formation

Focusingcoil

Photosensitive conductive target

Fig. 2.20 The vidicon.

Iconoscopetube. Theiconoscope isnowlargelyofhistoricalinterest.Init, an electron beam scansatargetconsistingofathin mica sheet ormosaiccoated withaphotosensitive layer.Incontrast tothe vidicon and orthicon, the electron beamandthelightbothstrikethesamesideofthetargetsurface.Thebackofthe mosaiciscoveredwithaconductivefilmconnectedtoanoutputload.Thearrange mentisequivalent toamatrixofsmallcapacitorswhichdischargethrough acom monlead. Imagedissectortube. Theimagedissectortubeoperateson instantaneous scanning rather than byneutralizing positive charges. Light from thesceneis focused onacathodecoatedwithaphotosensitive layer (Fig.2.21).The cathode emitselectronsinproportiontotheamountoflightstrikingit. Theseelectronsare accelerated toward atarget bytheanode. Thetarget isanelectron multiplier coveredbyasmallaperturewhichallowsonlyasmallpartofthe"electron image" emitted bythecathode toreach thetarget. The electron image isfocused by a focusing coil that producesanaxial magneticfield.The deflection coilsthen scan theelectronimagepastthetargetaperture,wheretheelectronmultiplier produces avaryingvoltagerepresentingthevideosignal.Theimageisthus"dissected"asit isscannedpastthetarget,inanelectronicversionofaflatbeddigitizingprocess. Chargetransferdevices. Amorerecent development inimage formation is that ofsolidstate image sensors, known ascharge transfer devices (CTDs). TherearetwomainclassesofCTDs:chargecoupled devices (CCDs) andcharge injectiondevices(CIDs). CCDsresemble MOSFETs (metaloxide semiconductorfieldeffecttransis tor) inthat they contain a"source" region and a"drain" region coupled by a depletionregion channel (Fig. 2.22). Forimaging purposes, they canbecon sidered asamonolithic array ofclosely spaced MOS capacitors forming a shift register (Fig.2.23).Chargesinthedepletion region are transferred totheoutput byapplyingaseriesofclocking pulsestoarowofelectrodes between the source andthedrain. Photons incident onthe semiconductor generateaseriesofchargeson the CCDarray.They aretransferred toanoutputregister either directly onelineata time (linetransfer) orviaatemporary storagearea (frame transfer). Thestorage
2.3 Imaging Devices for Computer Vision

49

Photosensitive Cathode Lens

Fig. 2.21 Image dissector.

area isneeded in frame transfer because the CCD array isscanned more rapidly thantheoutputcanbedirectlyaccommodated. Chargeinjection devices (CIDs) resemble CCDsexcept thatduring sensing the charge isconfined to the imagesitewhere it wasgenerated (Fig. 2.24). The charges areread using an XYaddressing technique similar to that used incom puter memories. Basically, thestored chargeis"injected" intothesubstrateand theresultingdisplacementcurrentisdetectedtocreatethevideosignal. CTDtechnology offers anumberofadvantagesover conventionaltubetype cameras:light weight, small size, lowpower consumption, resistance to burnin, lowblooming,lowdarkcurrent,highsensitivity,widespectralanddynamicrange, andlackofpersistence.CIDshavethefurther advantagesoverCCDsoftolerance toprocessingdefects, simplemechanization, avoidance ofchargetransfer losses, andminimizedblooming.CTDcamerasarenowavailablecommercially. AnalogtoDigitalConversion Withcurrent technology,therepresentation ofan imageasananalogelectri calwaveform isusually anunavoidable precursor tofurther processing.Thus the operation ofderiving adigitalrepresentation ofananalogvoltageisbasictocom putervisioninputdevices.
f*0NE ELEMENT*\ CLOCK

NTYPESILICON
50

Fig. 2.22 Chargecoupled device.


Ch. 2 Image Formation

Videoout

Output amplifier

32 X44Element bucket brigade photosensitivearray

Horiz. clock Fig. 2.23 ACCD array (line transfer).

Thefunction ofananalogtodigital (A/D) converteristotakeasinputavol tagesuchasavideosignalandtoproduceasoutputarepresentationofthevoltage indigital memory, suitableforreading byaninterface toadigitalcomputer.The qualityofan A/D converter ismeasured byitstemporal resolution (the speedat whichitcan perform conversions) andtheaccuracyofitsdigitaloutput. Analog

v^:x *v^^ v^^ v^x


M I I M I I H I I I Horizontal register Fig. 2.24 ACID array.

Photosensitive element

Chargingtransfer f l holdingelements I I

Video out

2.3 Imaging Devices lor Computer Vision

todigital converters are being produced as integrated circuit chips, but high qualitymodelsarestillexpensive.Theoutputprecisionisusuallyinthe8to12bit range. Itisquite possibleto digitizean entire frame ofaTVcamera (i.e.,approxi mately525scanlinesby300orsosamplesalongascanline)inasingleframe time (1/30secintheUnited States).Severalcommercialsystemscanprovidesuchfast digitizationintoa"frame buffer" memory,alongwithrastergraphicsdisplaycapa bilitiesfrom thesameframe buffer, and"videorateprocessing"ofthedigitaldata. The latter term refers to any ofvarious lowlevel operations (such asaveraging, convolution withsmalltemplates,imagesubtraction) which maybeperformed as fastastheimagesareacquired. OneinexpensivealternativetodigitizingentireTVframesatonceistousean interface thatacquirestheTVsignalforaparticularpointwhenthescanpassesthe requested location. With efficient programming, this pointbypoint digitization canacquireanentireframeinafewseconds.
2.3.2 SensingRange

Thethirddimension maybederivedfrom binocularimagesbytriangulation,aswe sawearlier,orinferred from singlemonocular visualinput byavarietyof "depth cues,"suchassizeandocclusion.Specializedtechnologyexiststoacquire "depth images" directly and reliably. Hereweoutline twosuch techniques: "light strip ing," which is based on triangulation, and "spot ranging," which is based on different principles. LightStriping Light striping isaparticularly simplecaseofthe useofstructuredlight [Will andPennington1971].Thebasicideaistousegeometricinformation intheillumi nation tohelpextractgeometricinformation from thescene.Thespatial frequen ciesandanglesofbarsoflightfallingonascenemaybeclusteredtofindfaces;ran domly structured light may allow blank, featureless surfaces to be matched in stereoviews;andsoforth. Manyresearchers [Popplestoneetal. 1975;Agin 1972;Sugihara 1977]have usedstripingtoderivethreedimensions.Inlightstriping,asingleplaneoflightis projected onto ascene,whichcausesastripeoflighttoappearon thescene (Fig. 2.25).Onlythe partofthe sceneilluminated bytheplaneissensed bythe vision system.Thisrestrictsthe"image"tobeanessentiallyonedimensionalentity,and simplifies matching corresponding points. The plane itself has aknown position (equation inworldcoordinates), determinable byanynumberofmethodsinvolv ing either the measurement of the projecting device or the measurement of the finalresultingplaneoflight.Everyimagepointdeterminesasingle"lineofsight" inthreespaceuponwhichtheworldpointthatproducestheimagepointmustlie. This line is determined by the focal point of the imaging system and the image point upon which the world point projects. In alightstriping system, any point that issensed in the image isalso guaranteed to lie on the light plane in three space.Butthelightplaneandthelineofsightintersectinjustonepoint (aslongas

Ch. 2 Image Formation

PROJECTOR

LIGHT PLANE

lllfe^
(b)

Fig. 2.25 Light striping, (a) A typical arrangement; (b) raw data; (c) data segmented into strips; (d) stripssegmented into two surfaces.

thecamera'sfocalpoint isnotinthelight plane). Thusbycomputation ofthein tersection of the line of sight with the plane of light, we derive the three dimensionalpointthatcorrespondstoanyimagepointvisibleaspartofastripe. Theplaneoflightmayresultfrom alaserorfromtheprojection ofaslit.Only thelightstripeshould bevisibletotheimagingdevice;unlessalaserisused, this impliesadarkened room. Ifacamera isfitted withtheproperfilter,alaserbased systemcanbeoperatedinnormallight.Anotheradvantageofthelaseristhatitcan befocusedintoanarrowerplanethancanaslitimage. The only points whose threedimensional coordinates can becomputed are those that can be "seen" byboth thelightstripe sourceand the camera atonce. Since there must be a nonzero baseline if triangulation is to derive three dimensionalinformation, thecameracannotbetooclosetotheprojector,andthus concavities inthescenearepotential troublespots,sinceboth thestriper and the
Sec.2.3 Imaging Devices tor Computer Vision 53

camera maynotbeableto"see" intothem.Surfaces inthescenethatarenearly parallelwiththelightplanewillhavearelativelysmallnumberofstripesprojected ontothem byanyuniform stripeplacementstrategy. Thisproblemisameliorated bystripingwithtwosetsofparallelplanesatrightanglestoeachother [Agin1972]. Amajor advantage oflightstripingoverspotrangingisthat (barringshadows) its continuity anddiscontinuityindicatesimilarconditionsonthesurface.Itiseasyto "segment" stripeimages (Part II):Stripesfalling on the samesurface mayeasily begatheredtogether.Thissetofrelatedstripesmaybeusedinanumberofwaysto derivefurther information onthecharacteristicsofthesurface (Fig.2.25b). SpotRanging Civilengineershaveusedlaserbased "spotrange finders" forsometime.In laboratorysize environments, they are arelatively new development. There are twobasictechniques. First, onecanemit averysharp pulseand timeitsreturn ("lidar," the light equivalent of radar). This requires asophisticated laser and electronics, sincelightmoves 1 fteverybillionthofasecond, approximately. The second technique isto modulate the laser light in amplitude and upon itsreturn compare the phase of the returning light with that of the modulator. The phase differences arerelated tothedistance traveled [Nitzanetal. 1977].Arepresenta tiveimageisshowninFig.2.26. Boththesetechniquesproduceresultsthatareaccuratetowithinabout1%of the range.Bothof them allow the laser to be placed closetoacamera, and thus "intensity maps" (images) and range maps may be produced from single viewpoints.Thelaserbeamcaneasilypokeintoholes,andthereturnbeammaybe sensed closetotheemitted one,soconcavitiesdonotpresent aseriousproblem. Sincethelaserbeamisattenuatedbyabsorption,itcanyieldintensity information aswell.Ifthelaserproduceslightofseveralwavelengths,itispossibletouse filters andobtainmultispectralreflectance information aswellasdepthinformation from thesamedevice[Garvey1976;Nitzanetal.1977]. Theusualmodeofuseofaspotrangingdeviceistoproducearangemapthat corresponds to an intensity map. This has its advantages in that the correspon dencemaybeclose.Thestructuralpropertiesoflightstripesarelost:Itcanbehard to"segment" theimageintosurfaces (totellwhich "rangepixels"areassociated with thesamesurface). Rangemapsareamenabletothesamesortsofsegmenta tiontechniquesthatareusedforintensityimages:Houghtechniques,regiongrow ing,ordifferentiationbased methodsofedgefinding(PartII). UltrasonicRanging Just aslightcanbepulsed todetermine range,socansoundand ultrasound (frequencies muchhigher thantheaudiblerange). Ultrasound hasbeenusedex tensively in medicine to produce images of human organs (e.g., [Waag and Gramiak 1976]).Thetimebetweenthetransmittedandreceivedsignaldetermines range;thesoundsignaltravelsmuchslowerthanlight,makingtheproblemoftim ing the returning signal rather easier than it isinpulsed laser devices. However, thesignalisseverelyattenuated asittravelsthrough biological tissue,sothat the detectionapparatusmustbeverysensitive.

54

Ch. 2 Image Formation

(a)

Fig. 2.26 Intensity and range images, (a) A (synthesized) intensity image of a street scene with potholes.The roofsallhave thesame intensity, which is different from the walls; (b) acorresponding range image. The walland roof ofeach house havesimilar ranges, but the rangesdiffer from house to house.

One basicdifference between sound and visible light ranging isthat alight beam isusually reflected offjust one surface, but that asound beam isgenerally partially transmitted and partially reflected by "surfaces." The returning sound pulsehasstructuredetermined bythediscontinuitiesinimpedencetosound found inthe medium through which it haspassed. Roughly, alight beam returns infor mation about a spot, whereas a sound beam can return information about the mediumintheentirecolumnofmaterial.Thus,althoughsound itselftravelsrela tivelyslowly,thedatarateimplicitinthereturningstructuredsoundpulseisquite high.Figure2.27showsanimagemadeusingtherangedatafrom ultrasound.The

Sec.2.3

Imaging Devices for Computer Vision

55

Fig. 2.27 Imagemadefrom ultrasoundranging.

sound pulsesemanatefrom the topoftheimageandproceed toward the bottom, being partially reflected andtransmitted alongtheway.Inthefigure,itisasifwe were looking perpendicular to the beams, which are being displayed as brighter where strong reflectance istaking place.Asingle "scan line" ofsound thuspro ducesanimageofanentireplanarsliceofmedium.
2.3.3 Reconstruction Imaging

Twodimensional reconstruction has been the focus of much research attention because of its important medical applications. Highquality images such as that showninFig.1.2bcanbeformed bymultipleimagesofxrayprojection data.This section contains the principles behind the most important reconstruction algo rithms. These techniques are discussed in more detail with an expanded list of references in [Gordonand Herman 1974].Foraviewofthemanyapplicationsof twodimensionalreconstruction otherthantransmissionscanning,thereaderisre ferredto[Gordonetal.1975]. Figure2.28showsthe basicgeometry tocollectonedimensional projections oftwodimensional data. (Mostsystemsconstruct the imageinaplaneandrepeat thistechnique for other planes;therearefew true threedimensional reconstruc tion systems that use planes of projection data simultaneously to construct volumes.) Inmanyapplicationssensors canmeasure the onedimensionalprojectionof twodimensional imagedata.Theprojection gGc')ofanidealimagefix, y) inthe direction 9isgivenbyJ fix', y') dy'where x'= R9x. Ifenough different projec tionsareobtained, agoodapproximation tothe imagecan beobtained withtwo dimensionalreconstruction techniques. FromFig.2.28,withthesourceatthefirstpositionalonglineAA', wecanob tainthefirstprojection datumfrom thedetectoratthefirstpositionalongBB'.The lineAB istermed arayand the measurement atBaraysum.Moving thesource

56

Ch. 2 Image Formation

Fig. 2.28 Projectiongeometry.

and detector along linesAA'and BB'in synchrony allows us toobtain the entire data for projection 1.Now the linesAA'and BB'are rotated byasmallangle dO about0andtheprocessisrepeated.Intheoriginalxraysystems d9was1ofan gle, and 180projections weretaken.Each projection comprised 160transmission measurements. The reconstruction problem issimply this:Given the projection datagjCx'),k 0, . . . , N I,constructtheoriginalimage fix). Systems in use today use a fan beam rather than the parallel rays shown. However, the mathematics issimpler for parallel rays and illustrates the funda mentalideas.Wedescribethreerelatedtechniques:summation,Fourierinterpola tion,andconvolution. TheSummationMethod Thesummation method issimple:Distributeeveryraysumgk{x') over the imagecellsalongtheray.WherethereareNcellsalongaray,eachsuchcellisin cremented bygix'). Thisstepistermed backprojection. Repeatingthisprocess for every ray results in an approximate version ofthe original [DeRosier 1971]. Thistechniqueisequivalent (withinascalefactor) toblurring the image,orcon volvingitwithacertainpointspreadfunction. Inthecontinuouscaseofinfinitely manyprojections,thisfunction issimplytheradicallysymmetrich(r) = \/r.

Sec.2.3 Imaging Devices for Computer Vision

57

(a)

Fig. 2.29 Basisof Fourier techniques, (a) Projection axisx'; (b) corresponding axisinFourierSpace.

FourierAIgorithms Ifaprojection isFouriertransformed, itdefinesalinethrough theoriginin frequency space (Fig.2.29).Toshowthisformally,considertheexpressionforthe twodimensional transform Fin) =fffix, y) exp \J2TT(wc+ vy)]dxdy (2.47)

Nowconsider^ = 0(projectionontothexaxis):x'=A:and Sot*') ffix, TheFouriertransform ofthisequationis 5 Igoix')] =fflfix, y) dy)expj2irux dx (2.49) y) dy (2.48)

= J J fix, y) expj2iTux dydx which,bycomparisonwith(2.47),is yfeoOcO] = F(u,0) (2.50) Generalizing to any 0, the transform of an arbitrary g(x') defines aline in the Fourierspacerepresentationofthecrosssection.WhereSk(w)isthecrosssection oftheFouriertransform alongthisline, Sk(<o) =F(u cos0, usin0)
=

(2.51)

J gk(x') exp [jl7ru(x')]dx'

Thusonewayofreconstructing theoriginalimageistousetheFourier transform of the projections to define points in the transform of fix), interpolate the undefined pointsofthe transform from theknown points,andfinallytakethein versetransform toobtainthereconstructedimage.
Ch.2 ImageFormation

r
M M

i i

Fig. 2.30 Convolution method.

This technique can be applied with transforms other than the Fourier transform, and such methodsarediscussed in [DeRosier 1971;Crowtherand Klug 1971]. TheConvolution Method The convolution method isthe natural extension ofthesummation method. Since the summation method produces an image degraded from its convolution with somefunction h,onecan remove thedegradation bya"deconvolution." The straightforward waytoaccomplish thisistoFouriertransform thedegraded image, multiply the result by an estimate of the transformed h~l , and inverseFourier transform theresult. However, sinceallthe operationsarelinear,afaster approach is to deconvolve the projections before performing the back projection. To show thisformally, weusethe inverse transform fix) = ff F(u, v)exp {/2TT(UX + vy)]du dv (2.52)

Changingtocylindricalcoordinates ico,9) yields fix) = ff F9i(o)exp[j2ira>ixcos9 +y sin 9)]\oo\dood9 (2.53)

S i n c e x ' = xcos9 + y sin0,rewriteEq. (2.53) as fix) = frl{Feia>)Hi<o)}d9 (2.54)

Since the image is bandlimited at some interval (tom,com) one can define Hico) arbitrarily outside of this interval. Therefore, Hico) can be defined as a constant minus atriangular peak asshown in Fig.2.30.Finally, the operation inside the in tegral inEq. (2.54) isaconvolution. Usingthe transforms shown inFig.2.30, fix) = flfoix') ~ fix')comsmc2icomx')] d9 (2.55)

Owing to its speed and the fact that the deconvolutions can be performed whilethedataarebeingacquired, theconvolution method isthemethod employed inthe majority ofsystems. EXERCISES 2.1 Inabinocularanimalvisionsystem,assumeafocallength/ o f aneyeof50mmanda separationdistancerfof5cm. MakeaplotofAxvs.zusingEq. (2.9).Iftheresolu tionofeacheyeisontheorderof50linepairs/mm,whatistheuseful rangeofthebi nocularsystem?

Exercises

59

Inanopponentprocesscolorvisionsystem,assumethatthefollowingrelationshold:
RG

Red Yellow Blue BY

Green

Forexample,ifthe(RG, B Y, WBk) componentsoftheopponentprocesssys temare(0.5,3,4),theperceivedcolorwillbeblue. Workouttheperceivedcolorsforthefollowing (R,G,B) measurements: (b) (0.2,0.3,0) (c) (7,4,1) (a) (0.2,0.3,0.4) Developanindexing scheme forahexagonal array anddefine aEuclidean distance measurebetweenpointsinthearray. Assumethataonedimensionalimagehasthefollowing form:
fix) = COS(2TT0A:)

andissampledwithus= u0.UsingthegraphicalmethodofSection2.2.6,findanex pressionforfix) asgivenbyEq. (2.49).Isthisexpression equaltotheoriginalim age? Explain. AcertainimagehasthefollowingFourier transform: F(u) = nonzero 0 insideahexagonal domain otherwise

(a) Whatarethesmallest valuesforuand vso thatFiu)canbereconstructed from Fx(u)? (b) Supposenowthatrectangular samplingisnotused butthatnowtheuandv directionssubtend anangleof7r/3.Doesthischangeyouranswerasto the smallestwandv?Explain. Extend thebinocular imagingmodelofFig.2.3toinclude convergence:Letthetwo imagingsystemspivotinthey= 0planeabouttheviewpoint.Letthesystemhavea baselineof2dandbeconverged atsomeangle0such thatapoint ix,y,z)appearsat theoriginofeachimageplane. (a) Solveforzintermsofrand9. (b) Solveforzinthissituationforpointswithnonzerodisparity. ComputetheconvolutionoftwoRectfunctions, where Rectlv)= Showthestepsinyourcalculations.
Ch. 2 Image Formation

1 0

0 < x < 1 otherwise

2.8 Rect(x) = b 0 for|x|< a otherwise

(a) WhatisRect(x)*8(xa)? (b) What is the Fourier transform of fix) where fix) = RectGc+c) + Rect(xc) andc> a? 2.9 Adigitizer hasasamplinginterval ofAx = by = A. Which ofthefollowing images canberepresented unambiguously bytheirsamples? (Assumethateffects ofafinite imagedomaincanbeneglected.) (a) (sm(7rx/A))/(7rx/A) (b) cos(7r/x/2A)cos(37rx/4A) (c) Rect(x) (seeProblem2.8)
(d) e"*2

R E F E R E N C E S AGIN, G. J. "Representation and description ofcurved objects" (Ph.D. dissertation). AIM173,Stan ford AILab,October 1972. ANDREWS, H.C.and B.R. HUNT. DigitalImage Restoration. Englewood Cliffs, NJ:PrenticeHall, Inc., 1977. CROWTHER, R.A.and A. KLUG. " A R T and science,or,conditionsfor 3d reconstruction from electron microscope images."/. TheoreticalBiology32,1971. DEROSIER, D.J. "The reconstruction of threedimensional images from electron micrographs." Con temporaryPhysics12,1971. DUDA, R.O.and P.E. HART. PatternRecognitionandSceneAnalysis.New York:Wiley, 1973. GARVEY, T.D. "Perceptual strategies for purposive vision."Technical Note 117,AICenter, SRI Inter national, September 1976. GONZALEZ, R.C.and P.WINTZ. DigitalImage Processing.Reading, MA:AddisonWesley, 1977. GORDON, R.and G. T. HERMAN. "Threedimensional reconstruction from projections: areview of al gorithms." InternationalReviewof Cytology38, 1974, 111151. GORDON, R., G. T. HERMAN, and S. A. JOHNSON. "Image reconstruction from projections." Scientific American, October 1975. HERING, E. "Principles of anew theory ofcolor sense." In Color Vision,R.C.Teevan and R.C. Birney (Eds.). Princeton, NJ:D.Van Nostrand, 1961. HORN, B.K.P."Understanding imageintensities."ArtificialIntelligence8,2,April 1977, 201231. HORN, B.K. P.and R.W. SJOBERG. "Calculating the reflectance map." Proc, DARPA IU Workshop, November 1978,115126. HURVICH, L. M.and D. JAMESON. " A n opponentprocess theory ofcolor vision." PsychologicalReview 64,1957,384390. JAIN, A. K. "Advances in mathematical models for image processing." Proc. IEEE 69, 5, May 1981, 502528. JOBLOVE, G. H. and D. GREENBERG. "Color spaces for computer graphics." Computer Graphics12, 3, August 1978,2025. KENDER, J. R. "Saturation, hue, and normalized color: calculation, digitization effects, and use." Technical Report, Dept. ofComputer Science,CarnegieMellon Univ., November 1976.
References

61

LAND, E.H."The retinex theoryofcolor vision."ScientificAmerican, December 1977, 108128. MUNSELL, A.H.AColorNotation, 8thed.Baltimore, MD:MunsellColorCo., 1939.
NICODEMUS, F. E., J. C. RICHMOND, J. J. HSIA, I. W. GINSBERG, and T. LIMPERIS. "Geometrical con

siderationsand nomenclature forreflectance." NBS Monograph 160,National BureauofStand ards, U.S.Department ofCommerce,Washington, DC,October 1977. NITZAN, D.,A. BRAIN, andR.DUDA. " T h emeasurement anduseofregistered reflectance andrange data insceneanalysis."Proc.IEEE 65,2,February 1977.
POPPLESTONE, R.J.,C.M. BROWN, A.P.AMBLER, andG. F.CRAWFORD. "Forming models ofplane

andcylinder faceted bodiesfrom lightstripes."Proc, 4thIJCAI, September 1975,664668. PRATT, W.K.DigitalImageProcessing.New York:WileyInterscience, 1978. ROSENFIELD A.andA.C.KAK. DigitalPictureProcessing. New York:AcademicPress, 1976. SMITH, A.R."Color gamut transform pairs." ComputerGraphics12,3,August 1978,1219. SUGIHARA, K."Dictionaryguided scene analysis based ondepth information." InProgressReporton 3DObjectRecognition. Bionics Research Section, ETL,Tokyo, March 1977. TENENBAUM, J.M.andS.WEYL. " Aregionanalysis subsystem forinteractive scene analysis." Proc, 4th IJCAI,September 1975,682687. WAAG, R.B.andR.GRAMIAK. "Methods forultrasonic imagingofthe heart." UltrasoundinMedicine andBiology2, 1976, 163170. WILL, P.M.andK.S.PENNINGTON. "Grid coding:apreprocessing technique forrobot andmachinevi sion."ArtificialIntelligence2,3/4,Winter 1971,319329.

62

Ch. 2 Image Formation

EarlyProcessing
3.1 RECOVERINGINTRINSIC STRUCTURE

The imaging process confounds much useful physical information intothegray level array. In this respect, the imaging process is a collection of degenerate transformations. However, thisinformation isnotirrevocably lost, because there is much spatial redundancy: Neighboring pixels in the image have the same or nearly the same physical parameters. A collection of techniques, which wecall earlyprocessing, exploitsthisredundancy inordertoundothedegeneraciesinthe imaging process. These techniques have the character of transformations for changing the image into "parameter images" or intrinsic images[Barrow and Tenenbaum 1978;1981]whichreflectthespatialpropertiesofthescene.Common intrinsic parameters are surface discontinuities, range, surface orientation, and velocity. Inthischapterweneglecthighlevelinternalmodelinformation eventhough itisimportant and canaffect early processing.Consider thecaseoftheperceived centraledgeinFig.3.1a.AsshownbyFig.3.1b,whichshowsportionsofthesame image,thecentraledgeofFig.3.1aisnotpresentinthedata.Nevertheless,thehu manperceiver "sees"theedge,andonereasonableexplanationisthatitisaprod uct of an internal block model. Modeldirected activity is taken up in later chapters. These examples show how high level models (e.g., circles) can affect lowlevel processors (e.g.,edge finders). However, for thepurposesofstudyitis often helpful toneglecttheseeffects.Thesesimplifications makeiteasiertoderive thefundamental constraintsbetweenthephysicalparametersandgraylevels.Once these areunderstood, theycanbemodified using the moreabstract structuresof laterchapters. Most early computer vision processing can bedone with parallel computa tionswhoseinputstendtobespatiallylocalized.Whencomputingintrinsicimages

63

(a) Fig. 3.1

(b)

(a) Aperceived edge, (b) Portions ofimagein (a) showing the lack ofimage data.

the parallel computationsareiterated until the intrinsic parameter measurements converge to a set of values. A computation that falls in the paralleliterative categoryisknownincomputervisionasrelaxation[Rosenfeld etal.1976]. Relaxa tion isavery general computational technique that isuseful in computer vision. Specificexamplesofrelaxationcomputationsappearthroughoutthebook;general observationsonrelaxationappearinChapter12. Thischaptercoverssixcategoriesofearlyprocessingtechniques: 1. Filtering is ageneric name for techniques of changing image gray levels to enhance the appearance of objects. Most often this means transformations that make the intensity discontinuities between regions more prominent. Thesetransformations areoften dependentongrossobjectcharacteristics.For example,iftheobjectsofinterestareexpectedtoberelativelylarge,theimage canbeblurred toerasesmallintensitydiscontinuitieswhileretainingthoseof the object's boundary. Conversely, if the objects are relatively small, a transformation thatselectivelyremoveslargediscontinuitiesmaybeappropri ate.Filteringcanalsocompensateforspatiallyvaryingillumination. 2. Edgeoperatorsdetectand measureverylocaldiscontinuities inintensityorits gradient.Theresultofanedgeoperator isusually themagnitudeandorienta tionofthediscontinuity. 3. Range transforms use known geometry about stereo images to infer the dis tanceofpointsfrom theviewer.Thesetransformsmakeuseoftheinverseper spective transform tointerpret howpointsinthreedimensional space project onto stereo pairs. Acorrespondence between points in twostereo imagesof known geometry determines the range of those points. Relative range may also be derived from local correspondences without knowing the imaging geometryprecisely. 4. Surfaceorientationcanbecalculated ifthesourceillumination and reflectance properties of the surface are known. This calculation is sometimes called
Ch. 3 Early Processing

"shape from shading." Surface orientation isparticularly simple to calculate whenthesourceilluminationcanbecontrolled. 5. Optical low, r velocity fields of image points, can be calculated from local f o temporalandspatialvariationsinsequencesofgraylevelimages. 6. Apyramidisageneralstructureforrepresenting copiesofthe imageatmulti pleresolutions.Apyramidisa"utility structure"whichcandramaticallyim provethespeedandeffectiveness ofmanyearlyprocessingandlatersegmen tationalgorithms.

3.2 FILTERING THE IMAGE

Filteringisaverygeneralnotionoftransforming theimageintensitiesinsomeway so as to enhance or deemphasize certain features. We consider only transforms thatleavetheimageinitsoriginalformat:aspatialarrayofgraylevels.Spurredon by the needs of planetary probes and aerial reconnaissance, filtering initially received moreattention thananyotherareaofimageprocessingandthereareex cellentdetailedreferenceworks(e.g., [AndrewsandHunt 1977;Pratt 1978;Gon zalez and Wintz 1977]).Wecannot afford toexamine these techniques in great detail here;instead, our intent istodescribeaset oftechniques thatconveysthe principalideas. Almostwithoutexception,thebesttimetofilteranimageisattheimagefor mationstage,beforeithasbeensampled.Agoodexampleofthisisthewaychemi calstainsimprovetheeffectiveness ofmicroscopictissueanalysisbychangingthe imagesothatdiagnosticfeatures areobvious.Incontrast,filteringafter sampling often emphasizesrandom variationsintheimage,termed noise, thatareundesir ableeffects introduced inthesamplingstage.However,forcaseswheretheimage formation processcannot bechanged,digitalfilteringtechniquesdoexist.Forex ample,onemaywanttosuppresslowspatialfrequencies inanimageandsharpen itsedges.AnimagefilteredinthiswayisshowninFig.3.2. NotethatinFig.3.2theworkofrecognizingrealworldobjectsstillhastobe done.Yet theedgesin theimage,whichconstitute object boundaries, have been mademore prominent bythefilteringoperation. Goodfilteringfunctions arenot easy to define. For example, one hazard with Fourier techniques is that sharp edges in thefilterwillproduce unwanted "ringing" in the spatial domain, asevi dencedbyFig.2.5.Unfortunately, itwouldbetoomuchofadigression todiscuss techniquesoffilterdesign.Instead, theinterested readershould refer tothe refer encescitedearlier.
3.2.1 Template Matching

Templatematchingisasimplefiltering method ofdetectingaparticularfeaturein animage.Provided thattheappearanceofthisfeature intheimageisknownaccu


Sec.3.2 Filtering the Image

*5

(a)

(b)

Fig. 3.2 Effectsofhighfrequency filtering, (a)Originalimage,(b)Filteredimage.

rately,onecantrytodetectitwithanoperatorcalledatemplate.Thistemplateis,in effect, asubimagethatlooksjust liketheimageoftheobject.Asimilaritymeasure iscomputed which reflects howwell the imagedata match the template for each possibletemplatelocation.Thepointofmaximalmatchcanbeselectedastheloca tionofthefeature.Figure3.3showsanindustrialimageandarelevanttemplate. Correlation Onestandard similarity measurebetween afunctionfix) andatemplatetix) is theEuclideandistancediy) squared,givenby diy)2 =y[fix)tixy)]2
X

(3.1)

M x

By]T wemean } 2,forsomeM, Nwhichdefinethesizeofthetemplateex


x=MyN

tent.Iftheimageatpointyisanexactmatch,thendiy) =0;otherwise, diy)>0. Expandingtheexpressionford2,wecanseethat d2iy) =E [ / 2 ( x ) 2/(x)Kx y)+t2ix y)]


X

(3.2)

NoticethatJ ) t2ix y)isaconstanttermandcanbeneglected.When / 2 ( x ) is


X X

approximately constantittoo can bediscounted, leaving what iscalled thecross correlationbetween/and t. Rfliy)=yLfix)tixy)
X

(3.3)

Thisismaximizedwhentheportionoftheimage"under"tisidenticaltot.
Ch. 3 Early Processing

Template

Industrial Image Fig. 3.3 An industrial imageand template forahexagonal nut.

Onemayvisualizethetemplatematchingcalculationsbyimaginingthetem plate being shiftedacross the image to different offsets; then the superimposed valuesatthisoffset aremultipliedtogether, andtheproductsareadded.Theresult ingsum ofproductsforms anentry inthe "correlation array"whose coordinates aretheoffsetsattainedbythesourcetemplate. Ifthetemplateisallowedtotakealloffsetswithrespecttotheimagesuchthat someoverlaptakesplace,thecorrelationarrayislargerthaneitherthetemplateor the image. An n x n image with an m x m template yields an (n +m lxn +m \) correlation array. If the template is not allowed to shift off the image, the correlation array is (n m + 1x n m + 1); for m < n.Another form ofcorrelation results from computing the offsets modulo thesizeoftheimage;inotherwords,thetemplate"wrapsaround"theimage.Be ingshifted offtotheright,itsrightportionreappearsontheleft oftheimage.This sortofcorrelationiscalledperiodiccorrelation,andthosewithnosuchwraparound properties are called aperiodic. We shall beconcerned exclusively with aperiodic correlation.Onecanalwaysmodify theinputtoaperiodiccorrelationalgorithmby paddingtheoutsidewithzerossothattheoutputistheaperiodiccorrelation. Figure 3.4 provides an example of (aperiodic) "shift, add, multiply" tem platematching.Thisfigureillustratessomedifficulties withthesimplecorrelation measureofsimilarity. Manyoftheadvantagesanddisadvantagesofthismeasure stemfrom thefactthatitislinear.Theadvantagesofthissimplicityhavemainlyto dowiththeexistenceofalgorithmsforperforming thecalculation efficiently (ina transform domain) for theentiresetofoffsets.Thedisadvantageshavetodowith

Template 1 11 1 11 1 11

Image 1 1 0 00 1 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 08

Correlation 7 4 2 x x 5 3 2 x x 2 19 x x
X X X X X X X X X X

x undefined

Fig. 3.4 (a) Asimpletemplate, (b) An image with noise, (c)The aperiodiccorrelation array of the template and image. Ideallypeaksin the correlation indicatepositionsofgood match. Here thecorrelation isonlycalculated foroffsets that leavethe template entirely within theimage.The correct peak isthe upper left one at0,0 offset. The "false alarm" atoffset 2,2iscaused bythe bright "noise point" inthe lower right ofthe image.

Sec.3.2 Filtering the Image

67

the fact that the metric issensitive toproperties ofthe imagethat mayvarywith theoffset, suchasitsaveragebrightness. Slightchangesintheshapeoftheobject, itssize,orientation,orintensityvaluescanalsodisturbthematch. Nonetheless, the idea oftemplate matching isimportant, particularly ifEq. f o (3.3) isviewedasailtering perationinsteadofanalgorithmthatdoesallthework of object detection. With this viewpoint one chooses one or more templates (filters) that transform the image so that certain features of an object are more readily apparent.Thesetemplatesgenerallyhighlight subpartsoftheobjects.One suchclassoftemplatesisedgetemplates (discussedindetailinSection3.3). Weshowed inSection 2.2.4 thatconvolution and multiplication areFourier transform pairs.Nownotethatthecorrelation operation in (3.3) isessentially the same as a convolution with a function t'(x) = t(x). Thus in a mathematical sensecrosscorrelationandconvolutionareequivalent.Consequently,ifthesizeof the template issufficiently large, it ischeaper to perform the template matching operationinthespatialfrequency domain,bythesametransform techniquesasfor filtering. NormalizedCorrelation Acrucialassumption inthedevelopment ofEq.(3.3) wasthattheimageen ergycovered bythematching template atanyoffset wasconstant; thisleadstoa linearcorrelation matchingtechnique.Thisassumption isapproximatelycorrectif the average image intensity varies slowly compared to the template size, but a brightspotintheimagecanheavily influence thecorrelation byaffecting thesum ofproductsviolentlyinasmallarea (Fig.3.4).Eveniftheimageiswellbehaved, therangeofvaluesofthemetriccanvarywiththesizeofthematchingtemplate. Aretherewaysofnormalizingthecorrelationmetrictomakeitinsensitivetothese variations? Thereisawellknown treatment ofthenormalized correlation operation. It hasbeenusedforavarietyoftasksinvolvingregistrationandstereopsisofimages [QuamandHannah 1974].Letussaythattwoinputimagesarebeingmatched to findthebestoffset thatalignsthem. Let/i(x) and/ 2 (x) betheimagestobematched.q2isthepatchof/ 2 (possi blyallofit)thatistobematchedwithasimilarsizedpatchoif\.q\'\S thepatchof / ] thatiscoveredbyq2whenq2isoffset byy. LetEQ betheexpectationoperator.Then o(q])= [E(q}) (E{q,))2\h
2 h

(3.4) (3.5)

o(q2)= [E(qi) (E(q2)) \

give the standard deviations ofpoints inpatches q{ and q2. (Fornotational con venience, wehavedropped the spatialargumentsof q\ and q2.) Finally, the nor malizedcorrelationis E(qxq2) E(ql)E(q2) , \ r \ 0.6) crKqi)o{q2) and E(q\q2) isthe expected value ofthe product ofintensities ofpointsthatare superimposedbythetranslationbyy. N(y) =
Ch. 3 Early Processing

Thenormalizedcorrelationmetricislessdependentonthelocalpropertiesof thereferenceandinputimagesthanistheunnormalizedcorrelation,butitissensi tive to the signaltonoise content ofthe images. High uncorrelated noise in the twoimages,ortheimageandthereference, decreasesthevalueofthecorrelation. Asaresult,oneshould exercisesomecareininterpreting themetric.Ifthenoise properties of the image are known, one indication of reliability is given by the "(signal + noise)tonoise"ratio.Forthenormalizedcorrelationtobeuseful, the standarddeviationofthepatchesofimagestobematched (i.e.,oftheareasofim ageincluding noise) should besignificantly greater thanthatofthe noise.Thena correlation value maybeconsidered significant ifitisapproximately equal to the theoreticallyexpectedone.Consideruncorrelated noiseofidenticalstandarddevi ation, inapatch oftrue valuefix, y). Let thenoisecomponent oftheimagebe n(x,y). Thenthetheoreticalmaximumcorrelationis a (f +n) In matching an idealized, noisefree reference pattern, the best expected valueofthecrosscorrelationis

1 2 gSaL

.2

(3.7)

f^TT

(38)

Ifthe noise and signal characteristics of the data are known, the patch size may beoptimized byusing that information and the simplestatistical arguments above. However, such considerations leaveout theeffects ofsystematic,nonsta tisticalerror (suchasimagingdistortions,rotations,andscaledifferences between images).Thesesystematicerrorsgrowwithpatchsize,andmayswampthestatisti caladvantagesoflargepatches.Intheworstcase,theymayvitiatethe advantages ofthecorrelationprocessaltogether. Since correlation is expensive, it is advantageous to ensure that there is enough information inthe patcheschosen for correlation before the operation is done.Onewaytodothisistoapplyacheap "interest operator" before therela tively expensivecorrelation. The idea here isto makesure that the image varies enough togive ausable correlation image. If the image isofuniform intensity, evenitscorrelation with itself (autocorrelation) isflat everywhere, and no infor mationaboutwherethe imageisregistered withitself isderivable.The "interest operator"isawayoffindingareasofimagewithhighvariance.Infact,acommon anduseful interestmeasureisexactlythe (directional) varianceoversmallareasof image. Onedirectionalvariancealgorithmworksasfollows. The Moravec interest operator [Moravec 1977] produces candidate match points by measuring the distinctness of alocal piece of the image from its sur round.Toexplaintheoperator,wefirstdefineavariancemeasureatapixel(x)as v a r U j O v [f(x,y)f(x
k, I ins

+k,y + l)]:

Vt

(3.9)

s =\(0, a), (0,a),


3.2 Filtering the Image

(a,0), (a, 0)
69

whereaisaparameter.Nowtheinterestoperatorvalueisinitiallytheminimumof itselfandsurroundingpoints: IntOpVal(x) := min[var(x+ y)]


y<\

(3.10)

Nextacheckismadetoseeiftheoperatorisalocalmaximum bycheckingneigh borsagain.Onlylocalmaximaarekept. IntOpVal(x) := 0if IntOpVal(x) > IntOpVal(x + y) for y < 1 Finally,candidatepointsarechosenfrom theIntOpValarraybythresholding. xisacandidate point iff IntOpVal (x) > T (3.12) (3.11)

The threshold ischosen empirically to produce some fraction of the total image points.
3.2.2 Histogram Transformations

Agraylevel histogram ofan image isafunction that gives the frequency ofoc currenceofeachgraylevelintheimage.Wherethegraylevelsarequantized from 0to,thevalueofthehistogram ataparticulargraylevelp,denoted hip), isthe numberorfraction ofpixelsintheimagewiththatgraylevel. Figure3.5showsan imagewithitshistogram. Ahistogramisuseful inmanydifferent ways.Inthissectionweconsider the histogram asatool toguidegraylevel transformation algorithms that areakinto filtering. Averyuseful imagetransform iscalledhistogramequalization. Histogram equalization definesamappingofgraylevelspintograylevelsqsuchthatthedis tributionofgraylevelsqisuniform.Thismappingstretchescontrast (expandsthe

(a)
7 0

Fig. 3.5 (a)Animage,(b)Itsintensity histogram.


Ch. 3 EarlyProcessing

rangeofgraylevels) for graylevelsnear histogram maximaandcompressescon trast inareaswithgraylevelsnear histogram minima.Sincecontrast isexpanded formostoftheimagepixels,thetransformation usuallyimprovesthedetectabihty ofmanyimagefeatures. Thehistogramequalization mappingmaybedefined intermsofthe cumula tivehistogramfor theimage.Toseethis,considerFig.3.6a.Tomapasmall inter valofgraylevelsdpontoanintervaldqinthegeneralcase,itmustbetruethat giq)dq =hip) dp (3.13) wheregiq) isthenewhistogram.If, inthehistogram equalizationcase,giq) isto beuniform, then gicii) =
9
hip)

N2 M

(3.14)

Fig. 3.6 (a) Basisfor ahistogram equalization technique, (b) Resultsofhisto gramequalization.
Filtering the Image

71

whereN2isthenumber ofpixelsintheimageandMisthenumberofgraylevels. ThuscombiningEqs.(3.13)and (3.14)andintegrating,wehave giq) =J^\hip)dp (3.15)

But Eq. (3.15) issimply the equation forthenormalized cumulative histogram. Figure3.6bshowsthehistogramequalized image.
3.2.3 Background Subtraction

Backgroundsubtractioncanbeanotherimportantfilteringstepinearlyprocessing. Manyimagescanhaveslowlyvaryingbackgroundgraylevelswhichareincidental tothetaskathand.Examplesofsuchvariationsare: Solutiongradientsincellslides Lightingvariationsonsurfacesinoffice scenes Lungimagesinachestradiograph Note that the lastexample isonlya"background" inthe contextoflookingfor somesmallervariationssuchastumorsorpneumoconiosis. Background subtractionattemptstoremovethesevariationsbyfirstapproxi matingthem (perhapsanalytically) withabackgroundimagefb andthensubtract ingthisapproximationfrom theoriginalimage.Thatis,thenewimage/is / ( x ) = / ( x ) / , ( x ) (3.16) Various functional forms have been tried foranalytic representations ofslowly varyingbackgrounds.Inthesimplestcases,fb(x) maybeaconstant, fb(x) =c orlinear, fb(x) =m'X +c (3.18) Amoresophisticatedbackground modelistousealowpassfilteredvariantofthe originalimage: fb(x)='.rl[H(u)F(u)) (3.19) where //(u) isalowpassfilteringfunction. The problem with this techniqueis thatitisglobal;onecannotcount onthe "best" effect inanylocalarea sincethe filtertreatsallpartsoftheimageidentically.Forthesamereason,itisdifficultto designaFourierfilterthatworksforanumberofverydifferent images. A workable alternative isto approximate /^(x), using splines,whichare piecewise polynomial approximation functions. Themathematics ofsplinesis treatedinChapter 8sincetheyfindmoregeneralapplication asrepresentationsof shape.Thefilteringapplication isimportant butspecialized.Theattractive feature ofasplineapproximationforfilteringisthatitisvariationdiminishingandspatially variant. Thesplineapproximation isguaranteedtobe"smoother" thantheorigi
Ch. 3 EarlyProcessing

(3.17)

nalfunction and willapproximate the background differently indifferent partsof theimage.Thelatterfeature distinguishesthemethodfrom Fourierdomain tech niqueswhicharespatiallyinvariant.Figure3.7showstheresultsofsplinefiltering.
3.2.4 FilteringandReflectance Models

Leavingtheeffects ofimaginggeometry implicit (Section2.2.2),thedefinitionsin Section 2.2.3imply that the image irradiance (gray level) at the imagepoint x'is proportional to the product of the scene irradiance Eand the reflectance rat its correspondingworldpointx. / ( x ' ) = (x)r(x)
(3.20)

Theirradiance atxisthesum ofcontributionsfrom allillumination sources,and the reflectance isthat portion of the irradiance which isreflected toward theob server (camera). UsuallyEchangesslowlyoverascene,whereas/changesquickly over edges, due to varying face angles, paint, and so forth. In many cases one wouldliketodetectthesechangesinrwhileignoringchangesinE.Onewayofdo ing this is to filter the imagefix') to eliminate the slowly varying component. However, as/ i s theproductofilluminationandreflectance, itisdifficult todefine anoperation thatselectively diminishesEwhileretainingr. Furthermore, suchan operation must retain the positivityof/. Onesolution istotake thelogarithm of Eq.(3.20).Then
log/ log + log/

(3.21)

Equation (3.21) showstwodesirablepropertiesofthelogarithmic transformation: (1)thelogarithmicimageispositiveinsign,and (2)theimageisasuperpositionof the irradiance component and reflectance component. Since reflectance isan in

Fig. 3.7 Theresultsofsplinefilteringtoremovebackgroundvariation.


Filtering the Image

73

trinsiccharacteristic ofobjects, the obviousgoal ofimageanalysisistorecognize the reflectance component under various conditions of illumination. Since the separationoftwocomponentsispreservedunderlineartransformations andtheir radiancecomponentisusuallyoflowspatialfrequency comparedtothereflectance component,filteringtechniquescansuppresstheirradiancecomponentofthesig nalrelativetothereflectance component. Ifthechangesin/occuroververyshortdistancesintheimages,rmaybeiso lated byathreestep process [Horn 1974]. First, toenhancereflectance changes, theimagefunction isdifferentiated (Section3.3.1).The secondstepremovesthe lowirradiancegradientsbythresholding. Finally,theresultantimageisintegrated toobtainanimageofperceived "lightness"orreflectance.Figure3.8showsthese stepsfortheonedimensionalcase. Abasicfilmparameter isdensity, which isproportional to the logarithmof transmittedintensity;thelogarithmicallytransformed imageiseffectively adensity image. Inaddition tofacilitating the extraction oflightness,another advantageof thedensityimageisthatitiswellmatched toourvisualexperience.Theideasfor manyimageanalysisprogramsstemfrom ourvisualinspectionoftheimage.How ever, thehuman visualsystemrespondslogarithmically tolightintensityandalso enhances high spatial frequencies [Stockham 1972]. Algorithms derived from

(a)

(b)

Fig. 3.8 Stepsinprocessinganimage todetectreflectance, (a)Originalimage. (b)Differentiation followedby thresholding,(c)Integrationoffunction in(b). Ch.3 EarlyProcessing

introspective reasoning about the perceived image (which has been transformed by our visual system) will not necessarily be successful when applied to an unmodified intensityimage.Thusoneargumentforusingadensity transformation followed byhighspatialfrequency emphasisfilteringisthat the computer isthen "seeing"morelikethehumanimageanalyzer.

3.3 FINDING LOCALEDGES

Boundariesofobjectstendtoshowupasintensitydiscontinuitiesinanimage.Ex periments withthehuman visualsystem showthat boundaries inimages areex tremely important; often an object can be recognized from only acrude outline [Attneave 1954].Thisfact provides theprincipal motivation for representing ob jects bytheir boundaries. Also, the boundary representation iseasy to integrate intoalargevarietyofobjectrecognitionalgorithms. Onemightexpectthatalgorithmscouldbedesignedthatfindtheboundaries of objects directly from the graylevel values in the image. But when the boun darieshavecomplicatedshapes,thisisdifficult. Muchgreatersuccesshasbeenob tained by first transforming the image into an intermediate image of localgray level discontinuities, or edges, and then composing these into amore elaborate boundary. Thisstrategy reflects theprinciple:When thegapbetween representa tions becomes too large, introduce intermediate representations. In this case, boundaries thatarehighly modeldependent may bedecomposed into aseriesof localedgesthatarehighlymodelindependent. Alocaledgeisasmallareaintheimagewherethelocalgraylevelsarechang ingrapidly inasimple (e.g., monotonic) way.An edgeoperatorisamathematical operator (oritscomputational equivalent) withasmallspatialextent designed to detectthepresenceofalocaledgeintheimage function. Itisdifficult tospecify aprioriwhichlocaledgescorrespondtorelevantboun daries in the image. Depending on the particular task domain, different local changeswillberegarded aslikelyedges.Plotsofgraylevelversus distancealong thedirection perpendicular totheedgefor somehypothetical edges (Fig. 3.9ae) demonstrate somedifferent kindsof"edge profiles" thatarecommonlyencoun tered.Ofcourse,inmostpracticalcases,theedgeisnoisy (Fig.3.9d) andmayap pearasacompositeofprofiletypes.Thefactthatdifferent kindsofedgeoperators perform bestindifferent taskdomainshasprompted thedevelopment ofavariety ofoperators. However, the unifying feature ofmostuseful edgeoperatorsisthat theycomputeadirectionwhichisalignedwiththedirection ofmaximalgraylevel change, and amagnitudedescribing the severity ofthischange.Sinceedgesarea highspatialfrequency phenomenon, edge finders are also usually sensitive to highfrequency noise,suchas"snow"onaTVscreenorfilmgrain. Operators fall into three main classes: (1) operators that approximate the mathematicalgradientoperator, (2)templatematchingoperatorsthatusemultiple templates atdifferent orientations, and (3) operators thatfitlocalintensitieswith parametric edge models. Representative examples from the first two of these categoriesappearinthissection.Thecomputervisionliteratureaboundswithedge
Sec. 3.3 Finding Local Edges

75

J
(a) ' (b) *

XV
(c) (d)

Fig. 3.9 Edgeprofiles.

operators,andwemakenoattempttosummarizethemallhere.Foraguidetothis literature,see[RosenfeldandKak1976]. Parametric models generally capture more detailed edgestructure than the twoparameterdirectionandmagnitudevector;asaresult,theycanbemorecom putationally complicated.ForthisreasonandothersdiscussedinSection3.3.4,we shall omitadetailed discussion ofthesekindsofedgeoperators.Oneofthe best knownparametricmodelsisHueckel's [Hueckel 1971,1973],butseveral others have been developed since [Mero and Vassy 1975;Nevada 1977; Abdou 1978; Tretiakl979].
3.3.1 TypesofEdgeOperators

GradientandLaplacian Themostcommon andhistorically earliestedgeoperatoristhegradient [Roberts 1965].For an image function fix), the gradient magnitude six) and direction <j>(x)canbecomputedas s(x) = (Af + A22)* <(x)= atan(A2>A2) where Aj fix + n,y) fix, y) (3.24) (3.22) (3.23)

b2 =f(x,y +
7 6

n)fix,y)
Cb. 3 Early Processing

nisasmall integer, usuallyunity, andatan(x,y) returns tan 1 (x/y) adjusted to the proper quadrant. The parameter n is called the "span" of the gradient. Roughly,nshouldbesmallenoughsothatthegradientisagoodapproximationto the localchangesinthe imagefunction, yetlargeenough toovercomethe effects ofsmallvariationsin/. Equation (3.24) is only one difference operator, or way of measuring gray level intensities alongorthogonal directions using A!and A2.Figure 3.10 shows the gradient difference operators compared to other operators [Roberts 1965; Prewitt 1970].The reason for themodified operators ofPrewittandSobelisthat thelocalaveragingtendstoreducetheeffectsofnoise.Theseoperatorsdo,infact, performbetterthantheRobertsoperatorforastepedgemodel. Onewaytostudyanedgeoperator'sperformance istouseanidealedgesuch asthestepedgeshowninFig.3.11.Thisedgehastwograylevels:zeroandhunits. Iftheedgegoesthroughthefiniteareaassociatedwithapixel,thepixelisgivena valuebetweenzeroandh,dependingontheproportionofitsareacovered. Com parativeedgeoperatorperformancehasbeencarriedout [Abdou1978].Inthecase oftheSobeloperator (Fig.3.10c) themeasuredorientation<'isgivenby
A, A2
1 0

(a)

0 0

1 0

1 1

0 1

0 1

(b)

0 0

1 2

2 0

1 0

(c)
Sec.3.3 Finding LocalEdges

Fig. 3.10 Gradientoperators.


11

Fig. 3.11 Edgemodelsfororientation anddisplacementsensitivityanalyses.

<f>
* '

ifO < d> < tan"1 7tan2</> + 6tan</> 1 9tan2</> + 22tan0 1 iftan l
< 0 < <f> < TT/4 (3.25)

tan"

Argumentsfrom symmetryshowthatonlythe0 < <f> < ITI'4casesneedbeexam ined.Similarstudiescouldbemadeusingrampedgemodels. A rather specialized kind of gradient is that taken "between pixels." This schemeisshowninFig.3.12.Hereapixelmaybethoughtofashavingfourcrack edgessurrounding it, whosedirections ofarefixedbythepixeltobemultiplesof 7r/2.Themagnitudeoftheedgeisdeterminedby|/(x) /(y) |,wherexandyare thecoordinatesofthepixelsthathavetheedgeincommon.Oneadvantageofthis formulation is that it provides an effective way of separating regions and their boundaries.Thedisadvantageisthattheedgeorientationiscrude. The Laplacianisan edge detection operator that isanapproximation to the mathematicalLaplaciand2f/dx2 +d2f/dy2 inthesamewaythatthegradientisan approximationtothefirstpartialderivatives.OneversionofthediscreteLaplacian isgivenby

1 _
'Crack"edge Fig. 3.12 "Crack"edgerepresentation.
Ch. 3 Early Processing

LU y) =fix, y) V4fix, y + 1) + fix, y 1) +fix + \,y) +fix \,y)]

(3.26)

TheLaplacianhastwodisadvantagesasanedgemeasure:(1)useful directionalin formation isnot available, and (2) the Laplacian, beingan approximation to the secondderivative,doublyenhancesanynoiseintheimage.Becauseofthesedisad vantages,theLaplacianhasfallen intodisuse,althoughsomeauthorshaveusedit asanadjunct tothegradient [WechslerandSklansky 1977;Akatsuka 1974]inthe followingmanner:Thereisanedgeatxwithmagnitudeg(x)anddirection<f>(x)if gix) > r , a n d L ( x ) > T2. EdgeTemplates TheKirschoperator[Kirsch 1971]isrelatedtotheedgegradientandisgiven by Six) =max [1, max/(xfc)]
kl

k+l

(3.27)

wherefi\k) aretheeight neighboring pixelsto xand wheresubscripts arecom puted modulo 8.A3bitdirection canalso beextracted from the valueof kthat yieldsthemaximumin(3.27).Inpractice,"pure"templatematchinghasreplaced the use of (3.27). Four separate templates are matched with the image and the operatorreportsthemagnitudeanddirectionassociatedwiththemaximummatch. Asonemightexpect,theoperatorissensitivetothemagnitudeoffix), sothatin practice variants using large templates are generally used. Figure 3.13 shows Kirschmotivatedtemplateswithdifferent spans.
1 1 1 1 1 1

1 0 1 n = 1 1 0 1 1 0 1

1 0

1 1 0 0

1 1

1 0

1 0 1 1 1 o

1 0 1 0 1 1

1 1 1

1 0 1 1 0 1 1 0 1 1 0 1 1 0 1

1 1 1 1 1 1 1 1 1 1 0 0 0 0 0

0 1 0

1 1 1 1 1 1 1 1 1 1

1 1 1 1 0 1 1 1 1 1 0 1 0 1 1

n=2

1 1 0

1 1 1 1 1 1 1 1 1 1

1 1 1 0

1 0 1 1 1 0 1 1 1 1

1 1 1 1 0

Fig. 3.13 Kirschtemplates.


Sec.3.3 Finding LocalEdges

This brief discussion ofedge templates should not be construed asacom ment ontheirappropriateness orpopularity. Infact, theyarewidelyused,andthe templatematching concept isthe essence of the other approaches.There isalso evidence that the mammalian visual system responds to edges through special lowleveltemplatematchingedgedetectors[HubelandWiesel1979].
3.3.2 EdgeThresholding Strategies

Formostimagestherewillbebutfewplaceswherethegradientmagnitudeisequal tozero.Furthermore, intheabsenceofanyspecialcontext, smallmagnitudesare most likely to be due to random fluctuations, or noise in the image function / . Thus inpractical casesone mayusetheexpedient ofonlyreporting an edgeele ment at x ifg(x) isgreater than some threshold, inorder toreduce these noise effects. Thisstrategy iscomputationally efficient but maynot bethe best.An alter native thresholding strategy [Frei and Chen 1977] viewsdifference operators as part ofaset oforthogonal basis functions analogous to the Fourier basisofSec tion 2.2.4. Figure 3.14 shows the nine FreiChen basis functions. Using this basis,theimagenearapointx0canberepresentedas fix) (f, hk)hk(x x0)/(hk, hk)
* i

(3.28)

wherethe (/, hk) isthecorrelationoperationgivenby


(/,A*) = Z / ( x o ) M x x o )
D

(3.29)

andDisthenonzerodomainofthebasisfunctions.Thisoperationisalsoregarded astheprojection ofthe imageinto the basisfunction hk. When the imagecan be reconstructed from the basis functions and their coefficients, the basis functions span thespace.Inthecaseofasmallersetoffunctions, thebasisfunctions spana subspace. The valueofaprojection intoanybasisfunction ishighestwhen the image function isidenticaltothe basisfunction. Thusonewayofmeasuring the "edge ness"ofalocalareainanimageistomeasuretherelativeprojection oftheimage
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

V2
1

1 1

2 1 2 1 4 2 1

V2
1

V2
1

1 V2

1 2

Fig. 3.14 FreiChenorthogonalbasis.


8 0 Ch. 3 Early Processing

intotheedgebasisfunctions.Therelativeprojectionintotheparticular"edgesub space"isgivenby

B.u cos0= () S
where

(3.30)

! hk)2
k=\

and

s Zif,h)2
k=0

Thusif9 < T,reportanedge;otherwise,not.Figure3.15showsthepotentialad vantage ofthistechniquecompared tothetechniqueofthresholding thegradient magnitude, using twohypothetical projections B\ and B2Even though B2 hasa smallmagnitude,itsrelativeprojection intoedgesubspaceislargeandthuswould becountedasanedgewiththeFreiChencriterion.Thisisnottruefor B\. Under manycircumstancesitisappropriatetousemodel information about theimageedges.Thisinformation canaffect thewaytheedgesareinterpreted after theyhavebeencomputedoritmayaffect thecomputationprocessitself.Asanex ampleofthefirstcase,onemaystilluseagradientoperator,butvarythethreshold forreportinganedge.Manyversionsofthesecond,moreextremestrategiesofus ingspecialspatiallyvariantdetectionmethodshavebeentried [PingleandTenen baum 1971;Griffith 1973;Shirai 1975].The basicidea isillustrated inFig.3.16. Knowledge of the orientation of an edge allows a special orientationsensitive operatortobebroughttobearonit. 3.3.3 ThreeDimensional EdgeOperators In many imaging applications, particularly medicine, the images are three dimensional.Considertheexamplesofthereconstructed planesdescribedinSec tions 1.1 and 2.3.4.The medical scanner that acquires these data follows several parallelimageplanes,effectively producingathreedimensionalvolumeofdata.

T,

Edge subspace

"gW
(a)

(b)

Fig. 3.15 Comparisonofthresholding techniques.


81

3.3 Finding LocalEdges

z /

./ 7? /
/

/ / / / /
/

/ /

Fig. 3.16 Modeldirectededge detection.

Inthreedimensional data,boundariesofobjectsaresurfaces.Edgeelements in two dimensions become surface elements in three dimensions. The two dimensional image gradient, when generalized to three dimensions, isthe local surfacenormal.Justasinthetwodimensionalcase,manydifferent basisoperators can beused [Liu 1977;Zuckerand Hummel 1979].ThatofZucker and Hummel uses an optimal basis assuming an underlying continuous model. We shalljust describetheoperatorhere;theproofofitscorrectnessgiventhecontinuousimage model may be found in the reference. The basis functions for the three dimensionaloperatoraregivenby g\bc,y, z) = r g2(x, y,z) =2 g3(x,y, z) = wherer = (x2 + y2 +z2)y\ Thediscreteform oftheseoperatorsisshowninFig. 3.17fora3 x 3x 3pixeldomainD.Onlygiisshownsincetheothersareobvious bysymmetry.Toapplytheoperatoratapointx0,,yo,?ocomputeprojectionsa, b, andc,where a = (g\,f) = L i ( x ) / ( x x 0 )
D

(3.31)

b = (gi, f)
C (ft. / )

(3.32)

Theresultofthesecomputationsisthesurfacenormaln = (a,b,c)atGco,^o,zn) Surface thresholding isanalogous toedgethresholding:Report asurface element


82
Ch. 3 Early Processing

v/3 3
* y

V ^ 2 1

3 2 V ^ 3

2
N / 3

3 2 v/3 3

v / 2 2 1

3 2

0 0

0 0

0 0

v/3
2 3

Fig. 3.17 The3x3x3edgebasis functiong\(x, y, z).

only ifsix, y, z) =\n|exceeds some threshold. Figure 3.18 showsthe resultsof applying the operator to a synthetic threedimensional image of a torus. The displayshowssmalldetectedsurfacepatches.
3.3.4 HowGoodareEdgeOperators?

Theplethoraofedgeoperatorsisverydifficult tocompareandevaluate.Forexam ple,someoperatorsmayfindmostedgesbutalsorespond tonoise;othersmaybe

X
Fig. 3.18 ResultsofapplyingtheZuckerHummel 3Doperatortosyntheticim agedataintheshapeofatorus.
3 Finding LocalEdges 83

noiseinsensitive butmisssomecrucialedges.Thefollowingfigureofmerit [Pratt 1978]maybeusedtocompareedgeoperators:

max(NA, TV/)~t 1+ {ad?) where NA and Nj represent the number of actual and ideal edge points, respec tively, aisascalingconstant, and disthesigned separation distanceofanactual edgepoint normal toalineofidealedgepoints.Theterm ad} penalizesdetected edgeswhichareoffset from their trueposition; thepenaltycanbeadjusted viaa. Using thismeasure, all operators havesurprisingly similar behaviors. Unsurpris ingly,theperformance ofeachdeterioratesinthepresenceofnoise[Abdou1978]. (Pratt defines asignaltonoise ratioasthe square ofthe stepedgeamplitudedi videdbythestandarddeviationofGaussianwhitenoise.) Figure3.19showssome typicalcurvesfordifferent operators.Tomakethisfigure,thethresholdforreport ing an edge was chosen independently for each operator so as to maximize Eq. (3.33). These comparisons are important as they provide a gross measure of differences in performance of operators even though each operator embodies a specific edge model and may be best in special circumstances. But perhaps the more important point isthatsinceallrealworld imageshavesignificant amounts ofnoise, alledge operators willgenerally produce imperfect results.This means thatinconsidering theoverallcomputer visionproblem, thatofbuildingdescrip tionsofobjects, the efforts areusuallybestspent indevelopingmethodsthatcan useorimprovethemeasurementsfrom unreliableedgesratherthaninasearchfor theidealedgedetector.

1_.

1.0

2.0

5.0

10

20

50

100

h2/a2 Fig. 3.19 Edgeoperator performance using Pratt's measure (Eq. 3.33).

Ch. 3 Early Processing

3.3.5 EdgeRelaxation Onewaytoimproveedgeoperatormeasurementsistoadjust thembasedonmeas urementsofneighboringedges.Thisisanaturalthingtowanttodo:Ifaweakhor izontaledgeispositionedbetweentwostronghorizontaledges,itshouldgaincred ibility. The edges can be adjusted based on local information using parallel iterative techniques. Thissort ofprocessisrelated to more globalanalysisandis complementarytosequentialapproachessuchasedgetracking (Chapter4). Early cooperative edge detection techniques used pairwise measurements betweenpixels [Zuckeretal.1977].Alaterversion [Prager 1980]allowsfor more complicated adjustment formulas. In describing the edge relaxation scheme, we essentially follow Prager's development and use the crack edgesdescribed at the endofthediscussion ongradients (Sec.3.31).Thedevelopment canbeextended totheotherkindsofedgesandthereaderisinvitedtodojustthisintheExercises. The overall strategy is to recognize local edge patterns which cause the confidence inan edgetobemodified. Prager recognizes threegroups ofpatterns: patternswhere theconfidence ofanedgecan beincreased, decreased, orleft the same.Theoverallstructureofthealgorithmisasfollows:

Algorithm3.1 EdgeRelaxation 0. Computetheinitialconfidence ofeachedgeC(e) asthenormalizedgradient magnitudenormalizedbythemaximumgradientmagnitudeintheimage. 1. k=\; 2. Computeeachedgetypebasedontheconfidence ofedgeneighbors; 3. Modify theconfidence ofeach edge Ck(e) based onitsedgetypeanditspre viousconfidence Ck~l(e); 4. Test the Ck{eYs toseeifthey haveallconverged toeither0or 1.Ifso,stop; else,increment/candgoto2.

Thetwoimportantpartsofthealgorithm arestep2,computingtheedgetype,and step3,modifying theedgeconfidence. Theedgetypeclassification reliesonthenotation foredges (Fig.3.20).The edge type is aconcatenation of the left and right vertex types. Vertex types are computed from thestrength ofedgesemanatingfrom avertex.Verticaledgesare handled in the sameway,exploiting the obvious symmetries with the horizontal case.Besidesthecentraledgee,theleftvertexistheendpointforthreeotherpos sibleedges.Classifying thesepossibleedgesinto"edge"and"noedge" provides theunderpinningsforthevertextypesinFig.3.21.

Sec. 3.3 Finding Local Edges

8 5

(a)

mm
(b)

(c)

(d)

(e)

Fig. 3.20 Edgenotation, (a) Edge position with noedge, (b) Edge position withedge, (c) Edgetobe updated, (d) Edgeofunknown strength, (e) Configuration ofedgesaround a central edgee.

To compute vertex type, choose the maximum confidence vertex, i.e., the vertexistypejwherejmaximizesconf(/) and conf(0) conf(l) conf(2) conf(3) where m=max(a,b,c,q) qisaconstant (0.1isaboutright) and a, b, and care the normalized gradient magnitudes for the three edges. Without loss of generality, a ^ b ^ c. The parameter m adjusts the vertex classification sothat it isrelative to the local maximum. Thus {a,b,c) = (0.25, 0.01, 0.01) isatype 1vertex.The parameter qforces weak vertices to typezero [e.g., (0.01,0.001,0.001) istypezero]. Oncethevertextypehasbeencomputed,theedgetypeissimple.Itismerely theconcatenationofthetwovertextypes. Thatis,theedgetypeis((/),where/and y'arethevertextypes.(Fromsymmetry,onlyconsider/ ^j.) (ma)(m b)(mc) aimb)(m c) abimc) abc

(a)

1
(b)

mm

I
mm cz]

1
(c) mm i i
(d) EM3

I 1

Fig. 3.21 Classification ofvertex type oflefthand endpoint ofedgee, Fig.3.20.

Ch. 3 Early Processing

Decisions in the second step of modifying edge confidence based on edge typeappearinTable3.1.Theupdatingformulais: increment: decrement: leaveasis: Ck+l(e) =min(1, Ck(e) +8) Ck+l(e) = max(0, Ck(e) 8) Ck+l(e) = Ck(e)

where8isaconstant (valuesfrom 0.1to0.3areappropriate).Theresultofusing therelaxationschemeisshowninFig.3.22.Thefiguresonthelefthand sideshow

Fig. 3.22 Edgerelaxationresults,(a)Rawedgedata. Edgestrengthshavebeenthreshold edat0.25for displaypurposesonly, (b) Resultsafter fiveiterationsofrelaxationappliedto (a), (c) Different versionof (a). Edgestrengths havebeen thresholded at0.25for display purposesonly,(d)Resultsafterfiveiterationsofrelaxationappliedto(c).
Sec. 3.3 Finding Local Edges

87

theedgeswithnormalizedmagnitudesgreaterthan0.25.Weakedgescausemany gapsintheboundaries.Thefiguresontherightsideshowtheresultsoffiveitera tionsofedgerelaxation.Heretheconfidenceoftheweakedgeshasbeenincreased owingtotheproximityofotheredges,usingtherulesinTable3.1.


Table3.1

Decrement Increment Leaveasis 00 02 03 11 12 13 01 22 23 33

3.4 RANGEINFORMATION FROM GEOMETRY

Neithertheperspectiveororthogonalprojection operations,whichtakethethree dimensional world to a twodimensional image, isinvertible in the usual sense. Sinceprojectionmapsaninfinitelineontoapointintheimage,information islost. Forafixedviewpointanddirection,infinitely manycontinuousanddiscontinuous threedimensionalconfigurations ofpointscouldprojectonourretinainanimage of, say,ourgrandmother. Simplecasesaregrandmothersofvarioussizescleverly placed at varying distances so as to project onto the same area. An astronomer might imagine millionsofpointsdistributed perhaps through lightyears ofspace which happen to line up into a "grandmother constellation." All that can be mathematically guaranteed by imaging geometry is that the image point correspondstooneoftheinfinitenumberofpointsonthatthreedimensional line of sight. The "inverse perspective" transformation (Appendix 1) simply deter minestheequationoftheinfinite lineofsightfrom theparametersoftheimaging processmodeledasapointprojection. However,alineandaplanenotincludingitintersectinjustonepoint.Lines ofsightareeasytocompute,andsoitispossibletotellwhereanyimagepointpro jects ontoanyknown plane (the supporting ground ortable planeisafavorite). Similarly, if two images from different viewpoints can be placed in correspon dence,theintersectionofthelinesofsightfrom twomatchingimagepointsdeter mines a point in threespace. These simple observations are the basis of light stripingranging(Section2.3.3)andareimportantinstereoimaging.
3.4.1. StereoVisionand Triangulation

Oneofthefirstideasthatoccurstoonewhowantstodothreedimensionalsensing is the biologically motivated one of stereo vision. Two cameras, or one camera from twopositions,cangiverelativedepthorabsolutethreedimensional location, dependingontheelaborationoftheprocessingandmeasurement. Therehasbeen
88 C/i. 3 Early Processing

considerableeffort inthisdirection [Moravec1977;QuamandHannah 1974;Bin ford 1971;Turner 1974;Shapira1974].Thetechniqueisconceptuallysimple: 1. Taketwoimagesseparatedbyabaseline. 2. Identify pointsbetweenthetwoimages. 3. Use the inverse perspective transform (Appendix 1) or simple tri angulation (Section 2.2.2) toderivethe twolinesonwhichtheworld pointlies. 4. Intersectthelines. Theresultingpointisinthreedimensionalworldcoordinates. The hardest part ofthismethod isstep2,that ofidentifying corresponding pointsinthe twoimages.Onewayofdoingthisistousecorrelation, or template matching, asdescribed inSection 3.2.1.The idea istotakeapatch ofone image andmatchitagainsttheotherimage,findingtheplaceofbestmatchinthesecond image, and assigning arelated "disparity" (the amount the patch has been dis placed)tothepatch. Correlation isarelativelyexpensiveoperation, itsnaiveimplementation re quiringQ(n2m2) multiplicationsandadditionsforan mxm patchand nxn image. Thisrequirementcanbedrasticallyimprovedbycapitalizingontheideaofvariable resolution;theimprovedtechniqueisdescribedinSection3.7.2. Efficient correlation isoftechnological concern, buteven ifitwerefree and instantaneous,itwouldstillbeinadequate. Thebasicproblemswithcorrelationin stereoimaginghavetodowiththefact that thingscanlooksignificantly different from different points of view. It is possible for the two stereo views to be sufficiently different that corresponding areas may not be matched correctly. Worse,insceneswithmuchobscuration,veryimportantfeaturesofthescenemay bepresentinonlyoneview.Thisproblemisalleviatedbydecreasingthebaseline, but of course then the accuracy of depth determinations suffers; at a baseline lengthofzerothereisnoproblem,butnostereoeither.Onesolution isto identify worldfeatures,notimageappearance,inthetwoviews,andmatchthose (thenose ofaperson, the corner ofacube). However, ifthreedimensional information is sought asahelpinperception, itisunreasonable tohavetodoperceptionfirstin ordertodostereo.
3.4.2 ARelaxationAlgorithmforStereo

Human stereopsis, orfusing theinputsfrom theeyesintoastereoimage,doesnot necessarily involve beingawareoffeatures tomatch ineither view.Most human beingscanfusequiteefficiently stereopairswhichindividuallyconsistofrandomly placed dots,and thuscanperceive threedimensional shapeswithout recognizing monocularcluesineitherimage.Forexample,considerthestereopairofFig.3.23. In either frame byitself, nothing but arandomly speckled rectangle can be per ceived.Allthestereoinformation ispresentintherelativedisplacementofdotsin the tworectangles.Tomake therighthand member ofthestereo pair,apatchof
Sec. 3.4 Range Formation from Geometry

89

Fig. 3.23 Arandomdot stereogram.

the randomly placed dots of the lefthand image is displaced sideways.The dots f i whicharethuscoveredarelost,andthespaceleftbydisplacingthepatchis illedn withrandomdots. Interestinglyenough,averysimplealgorithm [MarrandPoggio1976]canbe formulated that computes disparity from random dot stereograms.First consider the simpler problem of matching onedimensional imagesof four points asde picted in Fig. 3.24. Although only one depth plane allows all four points to be placed in correspondence, lesser numbers of points can be matched in other planes. Thecruxofthealgorithmistherules,whichhelpdetermine,onalocalbasis, theappropriatenessofamatch.Tworulesarisefrom theobservationthatmostim agesareofopaqueobjectswithsmooth surfaces anddepth discontinuitiesonlyat objectboundaries: 1. Eachpointinanimagemayhaveonlyonedepthvalue. 2. Apointisalmostsuretohaveadepthvaluenearthevaluesofitsneighbors.

Fig. 3.24 Thestereomatchingproblem. 90


Ch. 3 Early Processing

Figure 3.24canbeviewed asabinary network where each possible matchis represented byabinarystate.Matcheshavevalue1 andnonmatchesvalue0.Fig ure 3.25 showsanexpanded versionofFig.3.24.The connections ofalternative matchesforapointinhibit eachotherandconnections between matchesofequal depthreinforceeachother.Toextendthisideatotwodimensions,useparallelar raysfordifferent valuesofywhereequaldepth matcheshavereinforcing connec tions. Thus theextended array ismodeled asthematrix C(x,y,d)wherethe point x,y,dcorresponds toaparticular match between apoint (xi,y{) in the right imageandapoint (x2,yi)intheleft image.The stereopsisalgorithm pro duces aseriesofmatrices C which converges tothecorrect solution for most cases.Theinitialmatrix CQ{X,y,d)hasvaluesofonewherex,y,dcorrespondto amatchintheoriginaldataandhasvaluesofzeroorotherwise.

Algorithm3.2 [MarrandPoggio1976] UntilCsatisfiessomeconvergencecriterion,do C w + ,Uy,d) =.


x',y',d'ZS

C(x'ty\d')
x',y',d'9

(3.34) Cix',y',d') +C0(x, y,d)

wheretheterminbracesishandledasfollows:

jl ift> T
' '~ 0 otherwise S =setofpointsx',y',d'such that|x x'|<1andd= d' $=setofpointsx',y',d'suchthat|x x'|< 1and\d d'\= 1

Disparity

Matchbetween x andx'

Inhibitory connection Excitatory connection

Fig. 3.25 Extensionofstereo matching.

Sec.3.4 Range Formation from Geometry

91

One convergence criterion isthat the number ofpoints modified on an iteration must belessthansome threshold T.Fig.3.26showsthe results ofthiscomputa tion;thedisparityisencodedasagraylevelanddisplayedasanimagefor different valuesofn. A more general version of this algorithm matches image features such as edges rather than points (in the randomdot stereogram, the only features are

V&tt}

iK

jpi
Fig. 3.26 Theresultsofrelaxationcomputations for stereo.
Ch. 3 Early Processing

points), but theprinciplesarethe same.The extraction offeatures morecompli cated than edgesorpointsisitself athorny problemand thesubject ofPartII. It should bementionedthatMarrandPoggiohaverefined theirstereopsisalgorithm toagreebetterwithpsychologicaldata [MarrandPoggio1977].

3.5 SURFACEORIENTATION FROMREFLECTANCEMODELS

The ordinary visual world is mostly composed of opaque threedimensional ob jects.Theintensity (graylevel)ofapixelinadigitalimageisproducedbythelight reflected byasmallareaofsurfacenearthecorrespondingpointontheobject. Itiseasiesttogetconsistentshape(orientation) informationfromanimageif the lighting and surface reflectance do not change from one scene location to another.Analytically,itispossibletotreatsuchlightingasuniformillumination,a pointsourceatinfinity, oraninfinitelinearsource. Practically,thehumanshape fromshading transform is relatively robust. Of course, the perception of shape may be manipulated bychanging the surface shading incalculated ways.In part, cosmeticsworkbychangingthereflectivity propertiesoftheskinandmisdirecting ourhumanshapefromshading algorithms. Therecoverytransformation toobtaininformation aboutsurface orientation ispossibleifsomeinformation aboutthelightsourceandtheobject'sreflectivityis known. General algorithms to obtain and quantify this information are compli catedbutpracticalsimplifications canbemade [Horn 1975;Woodham 1978;Ikeu chi 1980].Themaincomplicatingfactor isthatevenwithmathematically tractable objectsurfaceproperties,asingleimageintensitydoesnotuniquelydefinethesur faceorientation.Weshallstudytwowaysofovercomingthisdifficulty. Thefirstal gorithm usesintensity imagesasinput and determines the surface orientationby using multiple light source positions toremove ambiguity in surface orientation. Thesecondalgorithm usesasinglesourcebutexploitsconstraintsbetweenneigh boringsurfaceelements.Suchanalgorithmassignsinitialrangesoforientationsto surface elements (actuallytotheir corresponding imagepixels) onthe basisofin tensity.The neighboring orientations are "relaxed" againsteachotheruntileach convergestoauniqueorientation (Section3.5.4).
3.5.1 Reflectivity Functions

For all these derivations, consider adistant point source of light impinging ona smallpatchofsurface;severalanglesfrom thissituationareimportant (Fig.3.27). Asurface's reflectance isthefraction ofagiven incidentenergy flux (irradi ance) itreflectsinanygivendirection. Formally, thereflectivityfunctionisdefined asr = 77;,whereLisexitant radiance and Eisincident flux.Ingeneral, for an dE isotropic reflecting surfaces, thereflectivity function (hence L) isafunction ofall threeangles/,e,andg.Thequantity ofinterest tousisimageirradiance,whichis proportional tosceneradiance,given byL = J / dE. Ingeneral,theevaluationof this integral can be quite complicated, and the reader is referred to [Horn and
Sec.3.5 Surface Orientation from Reflectance Models

93

Fig. 3.27 Importantreflectanceangles: i,incidence;e,emittance;g,phase.

Sjoberg 1978]foramore detailed study. For our purposes weconsider surfaces withsimplereflectivity functions. Lambertian surfaces, those with anideal matte finish, have avery simple reflectivity function whichisproportional onlytothecosineoftheincidentangle. These surfaces have the property that under uniform orcollimated illumination they look equally bright from any direction. Thisisbecause the amount oflight reflected from aunit area goes downas the cosineofthe viewing angle, but the amountofareaseeninanysolidanglegoesupasthereciprocalofthecosineofthe viewing angle.Thusthe perceived intensity ofasurface element isconstant with respect toviewer position. Other surfaces with simple reflectivity functions are "dusty" and "specular" surfaces. Anexampleofadustysurfaceisthelunarsur face, whichreflectsinalldirectionsequally.Specular (purelymirrorlike) surfaces suchaspolishedmetalreflect onlyattheangleofreflection = angleofincidence, and inadirection such that theincidence, normal, and emittance vectorsare coplanar. Most smooth things have aspecular component totheir reflection, butin generalsomelightisreflectedatallanglesindecreasingamountsfrom thespecular angle.Onewaytoachieve thiseffect istousethecosineoftheanglebetween the predictedspecularangleandtheviewingangle,whichisgivenbyCwhere C 2cos(/) cos(e) cos(g) This quantity isunity inthepure specular direction and falls offto zero at radians awayfrom it. Convincing specular contributions ofgreaterorlesssharp nessareproducedbytakingpowersofC.Asimpleradianceformulathatallowsthe simulationofbothmatteandspeculareffectsis Lit,e,g)=s(C)"+(ls)cosO) (3.35)
Ch. 3 Early Processing

Here 5variesbetween0and 1and determines the fraction ofspecularly reflected light;ndeterminesthesharpnessofspecularitypeaks.Asnincreases,thespecular peakgetssharperandsharper.Computergraphicsresearchisconstantlyextending thefrontiers ofrealisticanddetailedreflectance, refractance, andilluminationcal culations[Blinn1978;Phong1975;Whitted1980].
3.5.2 Surface Gradient

Thereflectance functions describedabovearedefinedintermsofanglesmeasured withrespecttoalocalcoordinateframe. Forourdevelopment, itismoreuseful to relate the reflectivity function to surface gradients measured with respect to a viewerorientedcoordinate frame. The concept of gradientspace,which isdefined in avieweroriented frame [Horn 1975],isextremelyuseful inunderstandingtherecoverytransformation al gorithmforthesurfacenormal.Thisgradientreferstotheorientationofaphysical surface, nottolocalintensities.Itmustnotbeconfused withtheintensitygradients discussedinSection3.3andelsewhereinthisbook. Gradient space is a twodimensional space of slants of scene surfaces. It measures abasic "intrinsic" (threedimensional) property of surfaces. Consider the pointprojection imaging geometry of Fig. 2.2, with the viewpoint at infinity (farfromthescenerelativetothescenedimensions).Theimageprojectionisthen orthographic,norperspective. Thesurfacegradient isdefined forasurfaceexpressedasz= fix, y). The gradientisavector (p,q), where

P=**
OX

(3.36)

a =

d(-z) By

Any plane in the image (such as the face plane of a polyhedral face) may be expressedintermsofitsgradient.Thegeneralplaneequationis Ax +By +Cz+D =0 Thus (3.37)

**+!>+f
andfrom (3.36)thegradientmayberelatedtotheplaneequation: z =px +qy +K

( 3 ) 3 8

(3.39)

Gradientspaceisthusthetwodimensionalspaceof (p,a)vectors.Thepand qaxesareoften considered tobesuperimposed onthexandyimageplanecoord inateaxes.Then the (p,q)vector is"in thedirection" ofthesurface slantofim agedsurfaces.Anyplaneperpendicular totheviewingdirection hasa(p,q)vector of (0,0).Vectorsontheq(ory)axiscorrespondtoplanestiltedaboutthe.xaxisin an"upward"or"downward" ("jward") direction (likethetiltofadressingtable
Sec.3.5 Surface Orientation from Reflectance Models

95

mirror). The direction arctan iqlp) is the direction of fastest change of surface depth (z)asxandychange, ip2 + q2)>/2istherateofthischange.Forinstance,a verticalplane"edgeon"totheviewerhasa(p,q)of(/o,0). ThereflectancemapR (p,q) representsthisvariationofperceived brightness withsurfaceorientation.Rip, q) givessceneradiance (Section2.2.3)asafunction ofsurfacegradient (inourusualviewercenteredcoordinatesystem).(Figure3.27 showed the situation and defined some important angles.) Rip, q) is usually shownascontoursofconstantsceneradiance (Fig.3.28).Thefollowing areafew usefulcases. In the caseof aLambertian surface with the source in the direction of the viewer (/ = e), the gradient space image looks like Fig. 3.28. Remember that Lambertiansurfaceshaveconstantintensityforconstantilluminationangle;these constant anglesoccuron the concentriccirclesofFig.3.28,sincethe directionof tilt does not affect the magnitude of the angle.The brightest surfaces are those illuminated from a normal directionthey are facing the viewer and so their gradientsare(0,0). Workingthisoutfromfirstprinciples,theincidentangleandemittanceangle arethesameinthiscase,sincethelightisneartheviewer.Botharetheanglebet ween thesurface normaland the viewvector.Lookingat thexyplanemeansa vector tothelightsourceof (0,0,1),andatagradient point ip,q), the surface normalisip,q,1).Also, R =rncost (3.40)

Fig. 3.28 Contours ofconstant radiance ingradient spacefor Lambertian sur faces;singlelightsourceneartheviewpoint. 96 Ch.3 EarlyProcessing

where r0isaproportionality constant, andweconventionally useRtodenotera dianceinaviewercenteredframe. Letns andnbeunitvectorsinthesourceand surfacenormaldirections.Sincecos/= n c n (3.41) ,2J (\+p + q2 \)J Thus cos (/) determines the image brightness, and soaplot of itisthegradient spaceimage (Figs.3.29and3.30). For amoregeneral light position, themathematics isthesame;ifthe light source isin the (ps, qs, 1)direction, take the dot product ofthisdirection and thesurface normal.
2

R =

ro

R = r0nns Or,inotherwords,

(3.42)

r0(psP +Qs Q +1) [(l+p +q2) (1+ p2 + q2)\h Thephaseanglegisconstant throughoutgradientspacewithorthographicprojec tion(viewerdistantfromscene)andlightsourcedistantfromscene. Setting R constant to obtain contour lines gives a secondorder equation, producing conicsections. In fact, the contours areproduced byaset ofconesof varying angles, whose axis is in the direction of the light source, intersectinga planeat unitdistancefrom theorigin.TheresultingcontoursappearinFig.3.29. Herethedarklineistheterminator, andrepresentsallthoseplanesthatareedge R =
2

Fig. 3.29 Contoursofconstantradianceingradientspace.Lambertian surfaces; lightnotnearviewpoint.


Sec.3.5 Surface Orientation from Reflectance Models

97

ontothelightsource;gradientsonthebacksideoftheterminatorrepresent self shadowed surfaces (facing awayfrom thelight).One intensity determinesacon tourandsogivesaconewhosetangentplanesallhavethatemittance.Forasurface withspecularity,contoursofconstant/(/, e,g) couldappearasinFig.3.30. The pointofspecularity isbetween themattecomponent maximum bright nessgradientandtheorigin.Thebrightestmattesurfacenormalpointsatthelight sourceandtheoriginpointsattheviewer.Purespecularreflection canoccurifthe vector tiltshalfway toward the viewer maintaining the direction of tilt. Thus its gradientisonalinebetweentheoriginandthelightsourcedirectiongradientpo int.
3.5.3 PhotometricStereo

The reflectance equation (3.42) constrains the possible surface orientation to a locus on the reflectance map.Multiple lightsource positions can determine the orientation uniquely [Woodham 1978].Each separate light positiongivesasepa rate value for the intensity (proportional toradiance) at each pointfix). If the surface reflectance r0 is unknown, three equations are needed to determine the reflectance together with the unit normal #i. If each source position vector is denotedbynk, h = 1 , . . . ,3,thefollowingequationsresult: 40c, v) = r0(nkn), where/ isnormalizedintensity.Inmatrixform I = r0Nn (3.44) k = 1, ..., 3 (3.43)

Fig. 3.30 Contoursofconstantradianceforaspecular/mattesurface.


9 8 Ch. 3 Early Processing

where

1=
and

[Il(x,y),I2(x,y),h{x,yW,
"n

"12 "22 "32

"13 "23 "33

N=

21 "31

(3.45)

andI =fc wherecistheappropriatenormalizationconstant.Ifcisnotknown,it canberegarded asbeingpartofr0 without affecting thenormaldirection calcula tion. Aslongasthe three sourcepositions IM,n2, n3arenotcoplanar, the matrix JVwillhaveaninverse.Thensolvefor r0andnbyusing (3.44),firstusingthefact thatnisaunitvectortoderive r0=W~ll\ andthensolvingforn toobtain n= N~ll (3.47) r0 Examplesofaparticular solutionareshowninFig.3.31.Ofcourse,aprerequisite for using this method isthat thesurface point not be in shadows for any of the sources. (3.46)

R2 (p,q)=0.723

Fig. 3.31 Aparticular solution for photometric stereo.

3.5.4 ShapefromShadingbyRelaxation Combininglocalinformation allowsimproved estimatesfor edges (Section 3.3.5) andfordisparity (Section3.4.2).Inasimilarmanner localinformation canhelpin computing surface orientation [Ikeuchi 1980].Basically, the reflectance equation
Sec.3.5 Surface Orientation from Reflectance Models

99

providesoneconstraintonthesurface orientation andanother isprovidedbythe heuristicrequirementthatthesurfacebesmooth. Suppose there is an estimate of the surface normal at a point (p(x, y), q(x, y)). Ifthenormalisnotaccurate,thereflectivity equationI(x, y) = Rip, q) willnot hold.Thus itseems reasonable toseekpand qthat minimize (/ R)1. The other requirement is that p(x, y) and qix, y) be smooth, and this can be measured bytheir Laplacians V2p and V2q. For asmooth curve both of these termsshouldbesmall.Thegoalistominimizetheerroratapoint, Eix, y) Uix, y)R ip,q))2 +k[(V2p)2 + (V2q)2] (3.48)

wheretheLagrangemultiplierk [Russell 1976]incorporatesthesmoothnesscon straint. Differentiating Eix, y) withrespect topand qandapproximating deriva tivesnumericallygivesthefollowingequationsforpix, y) andq(x, y): p(x, y) =PavU y) + T(x, y,p, q)^ oq q(x, y) =qav(x, y) + T(x, y,p,q)~r~ oq where T(x, y,p,q) = (l/X)[/0c,y) Rip, using pavix,y) jlpix + 1, y) +pix \,y) +pix,y + 1) +pix,y 1)] (3.51) q)) (3.49) (3.50)

andasimilarexpression for qav. NowEqs. (3.49) and (3.50) lendthemselvesto solution by the GaussSeidel method: calculate the lefthand sides with an esti mate forpand qand usethem toderiveanewestimatefor the righthand sides. Moreformally,

Algorithm3.3: Shapefrom Shading [Ikeuchi1980]. Step0. k = 0.Pickaninitialpix, y) andqix,y) nearboundaries. Step1. k= k+ 1;compute PkP$7l + T^ dp

Step2. IfthesumofalltheE'sissufficiently small,stop.Else,gotostep1.

1 0 0

Ch. 3 Early Processing

Alooseendinthisalgorithmisthatboundaryconditionsmustbespecified.These arevaluesof/>andqdeterminedapriorithatremainconstantthroughouteachite ration.Thesimplest placetospecify asurface gradient isatanoccluding contour (seeFig.3.32)wherethegradientisnearly90tothelineofsight.Unfortunately, p and qare infinite at these points. Ikeuchi's elegant solution to this is to usea different coordinate system for gradient space, that of a Gaussian sphere (Appendix 1).Inthissystem,thesurface normal isdescribed relativetowhereit intersects the sphere ifthe tailofthe normal isatthesphere's origin.Thisisthe pointatwhichaplaneperpendicular tothenormalwouldtouchthesphereiftran slatedtowardit(Fig.3.32b). In thissystemtheradiancemaybedescribedintermsofthespherical coor dinates9, <f>.ForaLambertian surface R(9,<f>) = cos9 cos9S+ sin9sin9S cos(< <f>s)
1

(3.52)

At an occluding contour <f> ir/2 and9 isgiven bytan (ByI 9x), where the derivativesarecalculatedattheoccludingcontour (Fig.3.32c).

(a)

(b)

(c)

Fig. 3.32 (a) Occluding contour, (b) Gaussian sphere, (c) Calculating 0 from occludingcontour.
Sec. 3.5 Surface Orientation from Reflectance Models 101

To use the (0, <> / )formulation instead of the ip,q) formulation isan easy matter.Simplysubstitute0forpand<f>forqinallinstancesoftheformulainAlgo rithm3.3.
3.6 OPTICAL FLOW

Muchoftheworkoncomputeranalysisofvisualmotionassumesastationaryob server andastationary background.Incontrast, biologicalsystemstypicallymove relativelycontinuouslythroughtheworld,andtheimageprojected ontheirretinas variesessentially continuously whiletheymove.Human beingsperceive smooth continuousmotionassuch. Although biological visual systems are discrete, this quantization issofine that it iscapable ofproducing essentially continuous outputs. These outputs can mirrorthecontinuousflowoftheimagedworldacrosstheretina.Suchcontinuous f P information iscalledoptical low. ostulatingopticalflowasaninputtoaperceptual systemleadstointerestingmethodsofmotion perception. The opticalflow,or instantaneous velocity field, assigns to every point on the visualfieldatwodimensional "retinal velocity"atwhichitismovingacross the visualfield.Thissectiondescribes howapproximations toinstantaneousflow maybecomputed from theusual inputsituation inasequenceofdiscreteimages. Methodsofusingopticalflowtocompute theobserver's motion,arelativedepth map,surfacenormalsofhisorhersurroundings,andotheruseful information are giveninChapter7.
3.6.1 TheFundamental Flow Constraint

Oneoftheimportantfeatures ofopticalflowisthatitcanbecalculatedsimply,us ing local information. One wayof doing this isto model the motion image bya continuous variation of image intensity asafunction of position and time, then expandtheintensityfunctionfix, y, t) inaTaylorseries. fix +dx,y +dy, t +dt) = fix,y,t) +^dx ax +^dy ay +^fdt at +higherorder terms (3.53)

Asusual,thehigherorder termsarehenceforth ignored. Thecrucialobser vation tobeexploited isthefollowing:Ifindeed theimageatsometime t +dtis theresult oftheoriginalimageattime/beingmovedtranslationallybydxanddy, theninfact fix +dx, y+dy, t+dt) =fix, y, t) Consequently,fromEqs. (3.53) and (3.54), _B = j9r dx dt dx dt
102

(3.54)
n K

9 / dy_ By dt

... '

Ch. 3 Early Processing

Nowe%r%and^ areallmeasurablequantities,and and7areestimates at ox oy dt dt ofwhatwearelookingforthevelocityinthexand^directions.Writing dx dt dt u, gives

^v MMdx + By u dt
orequivalently,

(3.56)

ifv,..

(3.57)

whereV / i s thespatialgradientoftheimageandu= (w,v)thevelocity. The implications of (3.57) are interesting. Consider afixedcamera witha scenemovingpastit.Theequationssaythatthetimerateofchangeinintensityofa point inthe imageis (tofirstorder) explained asthespatialrateofchangein the intensity ofthescenemultiplied bythe velocitythatpointsofthescenemovepast thecamera. This equation also indicates that the velocity (u, v) must lie on a line perpendiculartothevector (fx, fy) wherefx andfy arethepartialderivativeswith respecttoxandy,respectively (Fig.3.33).Infact,ifthepartialderivativesarevery accuratethemagnitudecomponentofthevelocityinthedirection (fx, fy) is (from 3.57): ft

l(fX2+/?)]*
3.6.2 CalculatingOpticalFlowbyRelaxation

Equation (3.57) constrains the velocity but doesnot determine it uniquely. The developmentofSection3.5.4motivatesthesearchforasolution thatsatisfiesEq.

Fig.3.33 Relationbetween(u,v)and

.3.6 OpticalFlow

103

(3.57) ascloselyaspossibleandalsoislocallysmooth [HornandSchunck 1980]. Inthiscaseaswell,theLaplaciansofthetwovelocitycomponents,V2wand V2v, canmeasurelocalsmoothness. Againusingthemethod ofLagrangemultipliers,minimizetheflowerror E2(x, y) = (fxu +fyv +f , ) 2 +X 2 [(V 2 a) 2 + (V 2 v) 2 ] (3.58)

Differentiating this equation with respect to uand vprovides equations for the change in error with respect to uand v, which must be zero for a minimum. WritingV2wasw uav andV 2 vasv vav,theseequationsare (X2+ fx2)u +fxfyv =k2uav fxft (3.59) (3.60)

fxfyu + (X2+ ffiv =X2vav fyft Theseequationsmaybesolvedfor uand v,yielding u = a v / , P_ =w D P = v = v a v fy D . where

(3.61) (3.62)

+/, D=x +L +f?


2

P = "Aav +fyVav
2

To turn thisinto aniterative equation for solving u(x, y) and vGc,y), againuse theGaussSeidelmethod.

Algorithm3.4: OpticalFlow[HornandSchunck1980]. k =0. Initializeallukand v*tozero. Untilsomeerrormeasureissatisfied, do uk = MaV1 fx


,k v *= ,v 1 _ / .P_ V.V1 f

As Horn and Schunck demonstrate, this method derives theflowfor two time frames, butitcanbeimprovedbyusingseveraltimeframesandusingthefinalsol ution after one iteration at one timefor the initial solution at the following time frame.Thatis:
104
Ch. 3 Early Processing

Algorithm3.5: Multiframe OpticalFlow. t=0. Initializeallu(x,y,0),v(x,y, 0) for t= 1untilmaxframesdo p u(x,y,t)= w av Uy,t1) fx p v(x,y, t)= vavGc,y,t 1) fy

Theresultsofusingsyntheticdatafrom arotatingcheckeredsphereareshownin Fig.3.34.

(a)

,^* ************

r********** ****** jft** ******** ** r* _.** **'*"'** * ** ** &***?*> ***'****

**

**********

W*'******^^j*f**'"'*^i*"'%***"* >,*.**//, ******** **+*** ******

******* ***********.******.-* **

**

VS* , &f'ai'Sa*=' > '"*'"'"""" , j> ^ J ^

(b)

(d)

Fig. 3.34 Opticalflowresults,(a),(b)and(c)arethreeframesfrom therotating sphere,(d)isthederivedthreedimensionalflowafter 32suchtimeframes.


Sec.3.6 OpticalFlow 105

3.7 RESOLUTION PYRAMIDS

Whatisthebestspatialresolution foranimage?Thesamplingtheoremstatesthat the maximum spatialfrequency inthe imagedata mustbelessthanhalfthesam plingfrequency inorder thatthesampledimagerepresent theoriginal unambigu ously.However,thesamplingtheoremisnotagoodpredictorofhoweasilyobjects canberecognized bycomputer programs.Often objectscanbemoreeasilyrecog nizedinimagesthathaveaverylowsamplingrate.Therearetworeasonsforthis. First, thecomputationsarefewer because ofthereduction indimensionality.Se cond, confusing detail present in the highresolution versions ofthe imagesmay notappearatthereducedresolution.Buteventhoughsomeobjectsaremoreeasily found atlowresolutions,usuallyanobjectdescriptionneedsdetailonlyrevealedat thehigherresolutions.Thisleadsnaturallytothenotionofapyramidalimagedata structureinwhichthesearchforobjectsisbegunatalowresolution,andrefinedat everincreasing resolutions until one reaches the highest resolution of interest. Figure3.35showsthecorrespondencebetweenpixelsforthepyramidalstructure. Inthenextthreesections,pyramidsareappliedtograylevelimagesandedge images.Pyramids, however, areaverygeneral tool and can beused to represent anyimageatvaryinglevelsofdetail.
3.7.1 Graylevel Consolidation

Insomeapplications, redigitizingtheimagewithadifferent samplingrateisaway toreducethenumber ofsamples.However,mostdigitizerparametersare difficult to change, so that often computational means of reduction are needed. A straightforward method is to partition the digitized image into nonoverlapping

106

Ch. 3 Early Processing

neighborhoodsofequalsizeandshapeandtoreplaceeachofthose neighborhoods bytheaveragepixeldensitiesinthatneighborhood.Thisoperationisconsolidation. For an/? x /ineighborhood, consolidation isequivalent toaveragingtheoriginal imageovertheneighborhoodfollowed bysamplingatintervalsnunitsapart. Consolidation tends to offset the aliasing that would beintroduced bysam plingthesensed dataatareduced rate.Thisisduetotheeffects oftheaveraging stepintheconsolidationprocess.Fortheonedimensionalcasewhere

fix) =j[f(x) +fix +*)]


thecorrespondingFouriertransform [Steiglitz 1974]is

(3.63)

(3.64) whichhasmagnitude\H(u)\ = COS[TT(u/u0)]andphaseTT(U/U0). Thesampling frequency u0 = 1/A whereAisthespacing between samples.Thustheaveraging step hastheeffect ofattenuating the higher frequencies ofF{u) asshown inFig. 3.36.Since the higher frequencies areinvolved inaliasing,attenuating these fre quenciesreducesthealiasingeffects.
3.7.2 Pyramidal Structuresin Correlation

With correlation matching, the use of multiple resolution techniques can some times provide significant functional and computational advantages [Moravec 1977]. Binary search correlation uses pyramids of the input imageand reference
F(u)
IH(u) |

Fig. 3.36 Consolidation effects viewed in the spatial frequency domain, (a) Original transform, (b)Transform ofaveragingoperator, (c)Transform ofaveraged image.
Sec.3.7 Resolution Pyramids 1 0 7

patterns.Thealgorithm partakes ofthecomputational efficiency ofbinary (asop posed to linear) search [Knuth 1973]. Further, the lowresolution correlation operationsathighlevelsinthepyramidensurethattheearliercorrelationsareon grossimagefeatures ratherthandetails. Inbinarysearchcorrelation afeature tobelocatedisatsomeunknownloca tionintheinputimage.Thereference versionofthefeature originatesinanother image,the reference image. Thefeature inthereference imageiscontained ina window ofn x pixels.The task ofthe correlator istofinda n n x n windowin the input image that best matches the reference image window containing the feature. The details of the correlation processes are given in the following algo rithm.

Algorithm3.6: BinarySearchCorrelationControlAlgorithm

Definitions OrigReference: anN x Nimagecontainingafeature centered at (Fea tureX,FeatureY). Origlnput: a n M x Marray inwhichaninstance oftheFeatureis to be located. For simplicity, assume that it isat the sameresolutionasOrigReference. a window size; an n x nwindow in OrigReference is largeenoughtocontaintheFeature. Window: an nx array containing avaryingresolution subim ageofOrigReferencecenteredontheFeature. Input: aIn x In arraycontainingavaryingresolution subim age of Origlnput, centered on the best match for the Feature. Reference: atemporaryarray.

Algorithm 1. Input:= ConsolidateOriglnputbyafactorofIn/Mto sizeIn x In. 2. Reference := Consolidate OrigReference by the same factor 2n/M to size 2nN/Mx 2nN/M. Thisconsolidation takestheFeature toanew (Featured, FeatureY). 3. Window := n x nwindow from Reference centered on the new (Featured, FeatureY). 4. Calculatethematchmetricofthewindowatthe (n+ l) 2locationsinInputat whichitiswhollycontained.Saythat thebestmatchoccursat (BestMatchZ, BestMatchY) inInput.
108
Ch. 3 Early Processing

5. Input := nx nwindowfrom Input centered at (BestMatchX, BestMatchlO, enlargedbyafactorof2. 6. Reference := Reference enlargedbyafactor of2.ThistakesFeaturetoanew (Feature*,FeatureY). 7. Goto3.

Through time, the algorithm usesareference imagefor matchingthatisal wayscentered onthefeature tobematched, but thathomes inonthefeatureby beingincreasedinresolutionandthusreducedinlinearimagecoveragebyafactor of2eachtime.Intheinputimage,asimilarhominginisgoingon,butthesearch area isusually twice the linear dimension of the reference window. Further, the center of the search area varies in the input image as the improved resolution refinesthepointofbestmatch. Binarysearchcorrelationisformatchingfeatureswithcontext.Thetemplate at low resolution possibly corresponds to much of the area around the feature, whilethefeaturemaybesosmallintheinitialconsolidatedimagesastobeinvisi ble.Thecoarsetofine strategyisperfect forsuchconditions,sinceitallowsgross features tobematchedfirstand toguidethe later highresolution searchfor best match.Suchmatchingwithcontextislessuseful forlocatingseveralinstancesofa shapedottedatrandomaroundanimage.
3.7.3 PyramidalStructuresinEdgeDetection

Asanexampleoftheuseofpyramidalstructuresinprocessing,considertheuseof such structures in edge detection. Thisapplication, after [Tanimoto and Pavlidis 1975],usestwopyramids,one tostorethe imageand another tostore theimage edges.Theideaofthealgorithm isthataneighborhood inthelowresolutionim agewherethegraylevelvaluesarethesameistakentoimplythatinfactthereis nograylevelchange (edge) in the neighborhood. Ofcourse, the lowresolution levels in the pyramid tend to blur the image and thus attenuate the graylevel changes thatdenoteedges.Thusthestartinglevelinthepyramidmust bepicked judiciouslytoensurethattheimportantedgesaredetected.

Algorithm3.7: HierarchicalEdgeDetection recursiveprocedurerefine (k, x,y) begin ifk <MaxLevelthen fordx= 0until1 do fordy=0until1 do //EdgeOp (k,x +dx,y +dy)>Threshold(x) thenrefine (k +l , x + dx,y +dy) end;
. 3.7 Resolution Pyramids

Fig. 3.37 Pyramidaledgedetection.


110
Ch. 3 Early Processing

procedureFindEdges: begin commentapplyoperatortoeverypixelin the startinglevels,refining where necessary; forx:=0 until2s 1do fory.= 0until2s 1do //EdgeOp (s,x,y) > Threshold(s) thenrefine (s.x,y); end;

Figure 3.37 shows Tanimoto's results for a chromosome image. The table inset shows the computational advantage in terms of the calls to the edge operator asa function ofthestartinglevels. Similar kinds of edge detection strategies based on pyramids have been pursued by [Levine 1978; Hanson and Riseman 1978]. The latter effort isa little different in that processing within the pyramid is bidirectional; information from edgesdetected atahighresolution level isprojected tolowresolutionlevelsof the pyramid. EXERCISES 3.1 Derive an analytical expression for the response ofthe Sobel operator to avertical stepedgeasafunction ofthedistanceoftheedgetothecenteroftheoperator. 3.2 Usetheformulas of Eqs. (3.31) toderivethedigitaltemplatefunction forg\ ina5 3 pixeldomain. 3.3 Specify aversionofAlgorithm3.1thatusesthegradientedgeoperatorinsteadofthe "crack"edgeoperator. 3.4 In photometric stereo, three or more lightsource positions areused todeterminea surface orientation.The dualofthisproblem usessurface orientations to determine lightsourceposition.Whatistheusefulness ofthelatterformulation? In particular, howdoesitrelatetoAlgorithm 3.3? 3.5 Usinganyone ofAlgorithms 3.1 through 3.4asanexample, show howitcould be modifiedtousepyramidaldatastructures. 3.6 Write a reflectance function to capture the "grazing incidence" phenomenon surfacesbecomemoremirrorlikeatsmallanglesofincidence (andreflectance). 3.7 Equations3.49and3.50werederivedbyminimizingthelocalerror.Showhowthese equationsaremodified whentotalerror [i.e., E(x, y)] isminimized. x,y REFERENCES
ABDOU,I.E."Quantitative methodsofedgedetection."USCIPIReport830,ImageProcessingInstit ute, Univ.SouthernCalifornia,July1978. AKATSUKA, T., T. ISOBE, and 0. TAKATANI. "Feature extraction ofstomach radiograph." Proc,2nd IJCPR,August 1974,324328.
References

111

ANDREWS, H.C.andB.R.HUNT. DigitalImage Restoration. Englewood Cliffs, NJ:PrenticeHall,1977. ATTNEAVE, F."Some informational aspectsofvisualperception." PsychologicalReview 61,1954. BARROW, H.G.andJ.M.TENENBAUM. "Computational Vision." Proc.IEEE 69,5,May1981,572595 BARROW, H. G. and J. M. TENENBAUM. "Recovering intrinsic scene characteristics from images." Technical Note 157,AICenter,SRIInternational, April1978. BINFORD, T.O."Visual perception bycomputer." Proc, IEEEConf. onSystemsandControl, Miami, December 1971. BLINN, J. E. "Computer display of curved surfaces." Ph.D. dissertation, Computer Science Dept., Univ.Utah, 1978. FREI, W. and C. C. CHEN. "Fast boundary detection: ageneralization and a newalgorithm." IEEE Trans. Computers26,2,October 1977,988998. GONZALEZ, R.C.andP.WINTZ. DigitalImageProcessing. Reading, MA:AddisonWesley,1977. GRIFFITH, A.K."Edge detection insimple scenes usingapriori information." IEEE Trans. Computers 22, 4,April1973. HANSON, A.R.andE.M.RISEMAN (Eds.). Computer VisionSystems (CVS). NewYork: Academic Press, 1978. HORN, B.K.P."Determining lightnessfrom animage." CGIP3,4, December 1974,277299. HORN, B.K.P."Shape from shading." InPCV, 1975. HORN, B.K.P.andB.G. SCHUNCK. "Determining optical flow." AI Memo 572,AILab,MIT,April 1980. HORN, B.K.P.andR.W.SJOBERG. "Calculating thereflectance map." Proc, DARPA IU Workshop, November 1978,115126. HUBEL, D.H.andT.N. WIESEL. "Brain mechanisms ofvision." ScientificAmerican, September 1979, 150162. HUECKEL, M." A n operator which locates edges in digitized pictures." /. ACM 18, 1,January1971, 113125. HUECKEL, M."A local visualoperator which recognizes edgesandlines."J.ACM20,4, October 1973, 634647. IKEUCHI, K."Numerical shape from shading andoccluding contours inasingle view." AIMemo566, AILab,MIT,revised February1980. KIRSCH, R.A."Computer determination ofthe constituent structure ofbiological images." Computers andBiomedicalResearch4,3,June 1971,315328. KNUTH, D.E.TheArtofComputerProgramming.Reading, MA:AddisonWesley,1973. LEVINE,M.D." A knowledgebased computer vision system." InCVS,1978. Liu,H.K."Twoand threedimensional boundary detection." CGIP'6,2,1977, 123134. MARR, D.andT.POGGIO."Cooperative computation ofstereo disparity."Science 194, 1976,283287. MARR, D. and T. POGGIO. "A theory of human stereo vision." AI Memo 451,AI Lab,MIT,No vember 1977. MERO, L. and Z. VASSY. " A simplified and fast version ofthe Hueckel operator for finding optimal edgesinpictures."Proc, 4thIJCAI,September 1975,650655. MORAVEC,H.P."Towardsautomaticvisualobstacleavoidance."Proc, 5thIJCAI, August 1977,584. NEVATIA, R. "Evaluation of a simplified Hueckel edgeline detector." Note, CGIP 6,6, December 1977, 582588. PHONG, BT."Illumination for computer generated pictures." Commun. ACM 18,6,June 1975, 311 317. PINGLE, K. K. and J. M. TENENBAUM. "An accommodating edge follower." Proc, 2nd IJCAI, September 1971,17. 112
Ch. 3 Early Processing

PRAGER,J.M."Extractingandlabelingboundary segmentsinnaturalscenes."IEEETrans. PAMI2, 1,January 1980,1627. PRATT,W.K.DigitalImageProcessing.NewYork:WileyInterscience,1978. PREWITT,J.M.S."Object enhancementandextraction."InPictureProcessingand Psychopictorics, B.S. LipkinandA.Rosenfeld (Eds.).NewYork:AcademicPress,1970. QUAM, L.andM.J. HANNAH. "Stanford automated photogrammetry research." AIM254, Stanford AILab,November1974. ROBERTS,L.G."Machineperceptionofthreedimensional solids."InOpticaland ElectroopticalInfor mationProcessing,J.P.Tippettet?1. (Eds.).Cambridge,MA:MITPress,1965. ROSENFELD,A.andA.C.KAK.DigitalPictureProcessing.NewYork:AcademicPress,1976. ROSENFELD,A.,R.A.HUMMEL,andS.W.ZUCKER."Scenelabellingbyrelaxationoperations."IEEE Trans.SMC6, 1976,430. RUSSELL,D.L.(Ed.).CalculusofVariationsand ControlTheory. NewYork:AcademicPress, 1976. SHAPIRA, R."A technique forthereconstruction ofastraightedge, wireframe object from two or morecentralprojections."CGIP3,4,December 1974,318326. SHIRAI,V."Analyzingintensityarraysusingknowledgeaboutscenes."InPCV, 1975. STEIGLITZ,K.AnIntroductiontoDiscreteSystems.NewYork:Wiley, 1974. STOCKHAM,T.J.,Jr."Imageprocessinginthecontextofavisualmodel."Proc. IEEE60,7,July1972, 828842. TANIMOTO,S.andT.PAVLIDIS."Ahierarchicaldatastructureforpictureprocessing." CGIP4, 2,June 1975,104119. TRETIAK,O..J."Aparametericmodel foredge detection." Proc,3rdCOMPSAC, November1979, 884887. TURNER,K.J."Computer perceptionofcurvedobjectsusingatelevisioncamera."Ph.D.dissertation, Univ.Edinburgh,1974. WECHSLER, H.andJ. SKLANSKY. "Finding theribcageinchest radiographs." Pattern Recognition 9, 1977,2130. WHITTED, T."Animproved illumination modelforshaded display." Comm.ACM 23, 6,June1980, 343349. WOODHAM, R.J."Photometricstereo:Areflectance maptechniquefordeterminingsurface orienta tion from image intensity." Proc, 22nd International Symp., Society ofPhotooptical Instru mentationEngineers,SanDiego,CA,August1978,136143. ZUCKER, S.W.andR.A.HUMMEL. "Anoptimal threedimensional edge operator." Report 7910, McGillUniv.,April1979.
ZUCKER,S.W.,R.A.HUMMEL,and A.ROSENFELD."An application ofrelaxation labelingtolineand

curveenhancement."IEEETrans. Computers26,1977.

ij"

References

113

SEGMENTED IMAGES
Knowledge base

Analogical models

Analogical/ propositional models

Generalized image

Segmented image

Geometric structures

Relational structures

Edge following

Texture

Motion

Theideaofsegmentation hasitsrootsinwork bytheGestalt psychologists (e.g., Kohler), whostudied the preferences exhibited byhuman beingsingrouping or organizingsetsofshapesarrangedinthevisualfield.Gestaltprinciplesdictatecer taingroupingpreferences basedonfeaturessuchasproximity,similarity,andcon tinuity.Otherresultshadtodowithfigure/grounddiscriminationandopticalillu sions. The latter have provided a fertile ground for vision theories to post Gestaltists such as Gibson and Gregory, who emphasize that these grouping mechanisms organize the scene into meaningfulunitsthat are a significant step towardimageunderstanding. Incomputer vision,groupingpartsofageneralized imageintounitsthatare homogeneouswithrespecttooneormorecharacteristics (orfeatures) resultsina segmentedimage. Thesegmented imageextendsthegeneralized imageinacrucial respect: it contains the beginnings of domaindependent interpretation. At this descriptive level the internal domaindependent models of objects begin to influence thegroupingofgeneralizedimagestructuresintounitsmeaningful inthe domain.For instance, the model maysupplycrucial parameters to segmentation procedures. Inthesegmentationprocesstherearetwoimportantaspectstoconsider:one isthe data structure used to keep track of homogeneous groups offeatures; the otheristhetransformation involvedincomputingthefeatures. Twobasicsortsofsegments arenatural:boundaries and regions.Thesecan be used combined into a single descriptive structure, a set of nodes (one per region),connected byarcsrepresenting the "adjacency" relation.The "dual"of thisstructurehasarcscorrespondingtoboundariesconnectingnodesrepresenting pointswhereseveral regions meet. Chapters 4and 5describe segmentation with respect toboundariesandregionsrespectively, emphasizinggraylevelsandgray leveldifferences asindicatorsofsegments.Ofcourse,from the standpoint of the

116

Part II Segmented Images

algorithmsinvolved, itisirrelevant whether the features areintensity graylevels orintrinsicimagevaluesperhapsrepresentingmotion,color,orrange. Texture and motion images are addressed in Chapters 6 and 7. Each has several computationally difficult aspects, and neither has received the attention givenstatic,nontextured images.However,eachisveryimportantinthesegmen tationenterprise.

PartII Segmented Images

117

Boundary Detection

4.1 ON ASSOCIATING EDGEELEMENTS

Boundariesofobjectsareperhapsthemostimportantpartofthehierarchyofstruc tures that links raw image data with their interpretation [Marr 1975].Chapter 3 describedhowvariousoperatorsappliedtorawimagedatacanyieldprimitiveedge elements. However, an image of only disconnected edge elements is relatively featureless;additionalprocessingmustbedonetogroupedgeelementsintostruc tures better suited tothe processofinterpretation. Thegoalofthetechniques in thischapter istoperform alevelofsegmentation, thatis,tomakeacoherent one dimensional (edge) feature from manyindividuallocaledgeelements.Thefeature couldcorrespond toan object boundary or toanymeaningful boundary between scene entities. The problems that edgebased segmentation algorithms have to contend withareshownbyFig.4.1,whichisanimageofthelocaledgeelements yielded byone common edge operator applied to achest radiograph. As can be seen, the edge elements often exist where no meaningful scene boundary does, and conversely often areabsent where aboundary is.For example, consider the boundaries ofribsasrevealed bytheedgeelements. Missingedgeelements and extraedgeelementsbothtendtofrustratethesegmentationprocess. The methods in this chapter are ordered according to the amount of knowledgeincorporated intothegroupingoperation thatmapsedgeelementsinto boundaries."Knowledge" meansimplicitorexplicitconstraintsonthelikelihood ofagivengrouping. Suchconstraints mayarisefrom general physical arguments or (more often) from stronger restrictions placed on the image arising from domaindependent considerations. If there ismuch knowledge, thisimplies that theglobalform oftheboundary and itsrelation toother imagestructures isvery constrained. Little prior knowledge means that the segmentation must proceed moreonthebasisoflocalcluesandevidenceandgeneral (domaindependent) as sumptionswithfewerexpectationsandconstraintsonthefinalresultingboundary.
119

Fig. 4.1 Edgeelementsinachest radiograph.

Theseconstraints take many forms. Knowledge ofwhere toexpectaboun dary allows very restricted searches to verify the edge. In many such cases, the domain knowledge determines the type of curve (its parameterization or func tional form) aswell asthe relevant "noise processes." In images of polyhedra, only straightedged boundaries are meaningful, and they will come together at varioussortsofverticesarisingfrom corners,shadowsofcorners,andocclusions. Human rib boundaries appear approximately like conic sections in chest radio graphs,and radiographs havecomplex edgestructures thatcancompete with rib edges. All this specific knowledge can and should guide our choice ofgrouping method. Iflessisknownabout thespecific imagecontent, onemayhavetofallback ongeneral worldknowledge orheuristics thataretruefor most domains.For in stance, in theabsence ofevidence to the contrary, the shorter line between two pointsmight beselected overalonger line.Thissortofgeneral principleiseasily built into evaluation functions for boundaries, and used in segmentation algo rithmsthatproceedbymethodicallysearchingforsuchgroupings.Iftherearenoa priori restrictions on boundary shapes, a general contourextraction method is calledfor,suchasedgefollowingorlinkingofedgeelements. Themethodsweshallexaminearethefollowing: 1. Searchingnearanapproximatelocation.Thesearemethodsforrefiningaboun darygivenaninitialestimate. 2. TheHough transform. Thiselegant and versatile technique appearsinvarious guises throughout computer vision. Inthischapter it isused todetect boun darieswhoseshapecanbedescribedinananalyticalortabular form. 3. Graphsearching. This method represents the image of edge elements as a graph.Thusaboundary isapaththrough agraph.LiketheHough transform, thesetechniquesarequitegenerallyapplicable.
120 Ch.4 BoundaryDetection

4. Dynamicprogramming.Thismethod isalsoverygeneral. Itusesamathemati calformulation ofthegloballybestboundaryandcanfindboundariesinnoisy images. 5. Contourfollowing. This hillclimbing technique works best with good image data.
4.2 SEARCHINGNEARANAPPROXIMATE LOCATION

Iftheapproximate orapriori likely location ofaboundary hasbeen determined somehow,itmaybeusedtoguidetheeffort torefine thatboundary [Kelly1971]. Theapproximatelocationmayhavebeenfoundbyoneofthetechniquesbelowap pliedtoalowerresolutionimage,oritmayhavebeendeterminedusinghighlevel knowledge.
4.2.1 AdjustingAPriori Boundaries

Thisideawasdescribed by [Bolles1977] (seeFig.4.2).Localsearchesarecarried outatregularintervalsalongdirectionsperpendiculartotheapproximate(apriori) boundary. Anedgeoperatorisappliedtoeachofthediscretepointsalongeachof theseperpendicular directions.Foreachsuchdirection,theedgewiththehighest magnitude isselectedfrom among thosewhoseorientationsarenearlyparallelto the tangent atthepointon thenearbyaprioriboundary.Ifsufficiently manyele mentsarefound, theirlocationsarefitwithananalyticcurvesuchasalowdegree polynomial,andthiscurvebecomestherepresentationoftheboundary.

Fig. 4.2 Searchorientationsfroman approximateboundarylocation. 4.2.2 NonlinearCorrelationinEdgeSpace

In thiscorrelationlike technique, the apriori boundary istreated asarigid tem plate,orpieceofrigidwirealongwhichedgeoperatorsareattachedlikebeads.The apriorirepresentation thusalsocontainsrelative locationsatwhichtheexistence ofedgeswillbetested (Fig.4.3).Anedgeelementreturned bythe edgeoperator application "matches" the apriori boundary ifitscontour istangent tothe tem plate and its magnitude exceeds some threshold. The template is to be moved around the image,andfor each location, thenumber ofmatches iscomputed.If the number ofmatchesexceedsathreshold, theboundary location isdeclared to
Sec. 4.2 Searching near an Approximate Location 1 2 1

Fig. 4.3 Atemplatefor edgeoperator application.

bethecurrenttemplatelocation.Ifnot,thetemplateismovedtoadifferent image pointandtheprocessisrepeated.Eithertheboundarywillbelocatedortherewill eventuallybenomoreimagepointstotry.


4.2.3 DivideandConquer Boundary Detection

This is atechnique that is useful in the case that a lowcurvature boundary is known toexistbetween twoedgeelements and the noiselevelsinthe imageare low (Algorithm 8.1). In thiscase, to find aboundary point in between the two known points, search along the perpendiculars ofthe linejoiningthetwopoints. The pointofmaximum magnitude (ifitisoversomethreshold) becomesabreak pointontheboundaryandthetechniqueisappliedrecursivelytothetwolineseg ments formed between thethree known boundary points. (Somefixmust beap plied if the maximum isnot unique.) Figure 4.4 showsone step in this process. Divideandconquer boundary detection has been used to outline kidney boun daries on computed tomograms (these images were described in Section 2.3.4) [Selfridgeetal. 1979].

Fig. 4.4 Divideandconquertechnique.

122

Ch.4 BoundaryDetection

(b)

Fig. 4.5 Aline(a)inimagespace;(b)inparameterspace. 4.3 THEHOUGHMETHOD FORCURVE DETECTION

The classical Hough technique for curve detection isapplicable iflittle isknown about the location ofaboundary, but itsshape can be described asaparametric curve (e.g., astraight line or conic). Its main advantages are that it is relatively unaffected bygapsincurvesandbynoise. To introduce the method [Duda and Hart 1972],consider the problem of detectingstraightlinesinimages.Assumethatbysomeprocessimagepointshave beenselectedthathaveahighlikelihoodofbeingonlinearboundaries.TheHough technique organizes these points into straight lines, basically by considering all possiblestraightlinesatonceandratingeachonhowwellitexplainsthedata. Consider the point x' inFig.4.5a, and the equation for aliney = mx + c. What arethe linesthatcould passthrough x'? Theanswer issimplyallthe lines withmandcsatisfyingy'= mx'+ c.Regarding(x\ y') asfixed, thelastequationis thatofalineinmcspace,orparameterspace.Repeatingthisreasoning,asecond point (x", y") willalso have an associated line inparameter space and, further more,theselineswillintersect atthepoint im', c')whichcorrespondstotheline AB connecting these points. In fact, all points on the lineAB willyield lines in parameterspacewhichintersectat'thepointim', c'),asshowninFig.4.5b. Thisrelationbetweenimagespacexandparameterspacesuggeststhefollow ingalgorithmfordetectinglines:

Algorithm4.1: LineDetectionwiththeHoughAlgorithm 1. Quantize parameter space between appropriate maximum and minimum valuesforcandm. 2. FormanaccumulatorarrayA(c,m) whoseelementsareinitiallyzero. 3. Foreachpoint (x,y) inagradientimagesuchthatthestrength ofthegradient

Sec.4.3 The HoughMethod forCurveDetection

123

exceedssomethreshold, increment all pointsintheaccumulatorarrayalong theappropriateline,i.e., A(c,m) :=A(c,m) +1 formandcsatisfying c=mx+> >withinthelimitsofthedigitization. 4. Localmaximaintheaccumulatorarraynowcorrespond tocollinear pointsin theimagearray.Thevaluesoftheaccumulatorarrayprovideameasureofthe numberofpointsontheline.

ThistechniqueisgenerallyknownastheHoughtechnique [Hough1962]. Sincemmaybeinfinite intheslopeintercept equation,abetterparameteri zationofthelineisxsintf + ycosB = r.Thisproducesasinusoidalcurvein (r,9) spaceforfixedx, y,butotherwisetheprocedureisunchanged. The generalization of this technique to other curves isstraightforward and thismethod worksfor anycurvefix, a) = 0,where aisaparameter vector. (In thischapterweoftenusethesymbol/ a svariousgeneralfunctions unrelatedtothe imagegraylevelfunction.) Inthecaseofacircleparameterizedby (xa)2+ (yb)2=r2 (4.1) forfixedx,themodifiedalgorithm4.1incrementsvaluesofa,b,rlyingonthesur faceofacone.Unfortunately, thecomputationandthesizeoftheaccumulatorar ray increase exponentially as the number of parameters, making this technique practicalonlyforcurveswithasmallnumberofparameters. TheHoughmethod isanefficient implementation ofageneralized matched filteringstrategy (i.e.,atemplatematching paradigm).Forinstance,inthecaseof acircle,imagineatemplatecomposedofacircleofTs (atafixedradiusR)andO's everywhereelse.Ifthistemplateisconvolvedwiththegradientimage,theresultis theportionoftheaccumulatorarrayA(a,b, R). Initsusualform, thetechniqueyieldsasetofparametersforacurvethatbest explainsthedata.Theparametersmayspecifyaninfinitecurve(e.g.,alineorpara bola).Thus,ifafinitecurvesegmentisdesired,somefurther processingisneces sarytoestablishendpoints.
4.3.1 Use of the Gradient

Dramaticreductionsintheamountofcomputationcanbeachievedifthegradient direction isintegrated intothealgorithm [Kimmeetal.1975].Forexample,con sidertheproblemofdetectingacircleoffixedradiusR. Without gradient information, all values a, blying on the circle given by (4.1) areincremented. With thegradient direction, onlythe pointsnear (a,b) in Fig.4.6needbeincremented. Fromgeometricalconsiderations,thepoint (a,b) is givenby

124

Ch.4 BoundaryDetection

L i J _' _l _ i _,

~r T

nn

Contentsofaccumulatortray Gradientdirectioninformation for artifact A<t>=45 DenotesapixelinP(x)superimposed on accumulatortray ^ Denote thegradient direction s

r r * T n * r T I T

SMBE " !~r|{THrra


H
' 'J i r i r ' , *rl i *ri H .
L I .1_I_I_I_J_|_!_ i _

Fig4.6 Reductionincomputationwithgradient information a =x rsin< b = y + rcos <f> (4.2)

where(f>(x) isthe gradient anglereturned byanedgeoperator. Implicit in these equations istheassumption thai;the circleistheboundary ofadisk thathasgray levels greater than its surroundings. These equations may also be derived by differentiating (4.2), recognizing that dy/dx = tan</>, and solving for aandb betweentheresultantequationand (4.2). Similarmethodscanbeappliedtoother conies.Ineachcase,theuseofthegradientsavesonedimensionintheaccumula torarray. Thegradient magnitudecanalso beused asaheuristicinthe incrementing procedure.Instead ofincrementing byunity, the accumulator array locationmay beincrementedbyafunction ofthegradientmagnitude.Thisheuristiccanbalance the magnitudeofbrightnesschangeacrossaboundary with theboundary length, butitcanleadtodetectionofphantomlinesindicated byafewbrightpoints,orto missingdimbutcoherentboundaries.
4.3.2 SomeExamples

TheHoughtechniquehasbeenusedsuccessfully inavarietyofdomains.Someex amples include the detection of human hemoglobin fingerprints [Ballard etal. 1975],thedetectionoftumorsinchestfilms[Kimmeetal.1975],thedetectionof storagetanksinaerialimages [Lantzetal.1978],andthedetectionofribsinchest radiographs [WechslerandSklansky 1977].Figure4.7showsthe tumordetection application.Asectionofthechestfilm(Fig.4.7b) issearchedfordisksofradius3 units.InFig.4.7c,theresultantaccumulatorarrayA[a,b,3]isshowninapictoral fashion, byinterpretingthearrayvaluesasgraylevels.Thisprocessisrepeated for variousradiiandthenasetoflikelycirclesischosenbysettingaradiusdependent thresholdfortheaccumulatorarraycontents.ThisresultisshowninFig.4.7d.The

Sec.4.3

TheHough Method for Curve Detection

1 2 5

(a)

(b)

(c)

(d)

Fig. 4.7 Using theHough technique forcircular shapes, (a)Radiograph, (b)Window,(c) Accumulator arrayforr=3.(d)Resultsofmaxima detection.

circular boundaries detected bytheHough technique areoverlaid ontheoriginal image.


4.3.3 Trading Off Work inParameter SpaceforWork in Image Space

Considertheexampleofdetectingellipsesthatareknowntobeorientedsothata principal axisisparalleltothexaxis. Thesecanbespecified byfour parameters. Usingtheequationfortheellipsetogether withitsderivative,andsubstitutingfor theknowngradientasbefore,onecansolvefortwoparameters.Intheequation

Ch. 4 Boundary Detection

( x x o ) 2 + ^ y z ^ = 1
a
2

xisan edge point and x 0 ,yo, a, and bareparameters.Theequation for itsderiva tiveis

i ^ ^ + o ^ ^=0
a b dx where dy/dx = tan4>(x).The Houghalgorithm becomes:

(44)

Algorithm4.2: Houghtechnique appliedtoellipses Foreachdiscretevalue ofxand.y, increment thepointinparameterspacegivenby a,b,x0, y0, where X=
XQ

(l + /> 2 /a 2 tan 2 <^

(4.5)
(4

yy** " 2 n~2*,*\* u (1+a tan 24>/b2) thatis, A(a, b, XQ,yo) :=A (a, b, XQ,yo) + 1

6 )

For aand beachhaving mvaluesthecomputationalcostisproportional to m2. Now suppose that we consider all pairwise combinations of edge elements. This introduces two additional equations like (4.3) and (4.4), and now the four parameter pointcan bedetermined exactly.Thatis,thefollowing equationscan be solvedfor auniquex0, yo, a, b.
(Xl

~2Xo)2 +^ p2 t =1
a b
>

(4.7a)
(4.7b)

(x2 XQ)2 a2 a1

(y2 y0)2

^_^+yiZ/l
bl

=0

(4.7c)
(4.7d)

dx

^Z^L
a2

+1Z/1 =0 bl dx

f=t a n 0 dx

(f is known from the edge operator) dx

4.3 The HoughMethod forCurveDetection

127

Theirsolution isleft asanexercise.Theamount ofeffort intheformer case was proportional to the product of the number of discrete values of a and b, whereasthiscaseinvolveseffort proportional tothesquareofthenumberofedge elements.

4.3.4 GeneralizingtheHoughTransform

Consider thecasewheretheobject beingsoughthasnosimpleanalyticform, but hasaparticularsilhouette.SincetheHoughtechniqueissocloselyrelatedtotem platematching,andtemplatematchingcanhandlethiscase,itisnotsurprisingthat theHoughtechniquecanbegeneralized tohandlethiscasealso. Supposefor the moment that the object appearsinthe imagewithknownshape,orientation, and scale.(Iforientationandscaleareunknown, theycanbehandledinthesameway thatadditionalparameterswerehandledearlier.)Nowpickareference pointinthe silhouetteanddrawalinetotheboundary.Attheboundarypointcomputethegra dientdirectionandstorethereference pointasafunction ofthisdirection.Thusit ispossibletoprecomputethelocationofthereferencepointfrom boundarypoints given thegradientangle.Thesetofallsuch locations,indexed bygradientangle, comprisesatabletermedthei?table [Ballard1981].Rememberthatthebasicstra tegyoftheHoughtechniqueistocomputethepossiblelociofreference pointsin parameterspacefrom edgepointdatainimagespaceandincrementtheparameter pointsinanaccumulatorarray.Figure4.8showstherelevantgeometryandTable 4.1showstheform of the i?table. For the moment, the reference point coordi nates (xc, yc) are the only parameters (assuming that rotation and scaling have beenfixed).Thusanedgepoint (x,y) withgradientorientation0 constrains the possiblereference pointstobeat{x+ r\ (<f>)cos [i (</>)],y +r\(<f>) sin [a\(</>)]} andsoon.

Fig. 4.8 Geometryusedtoform the flTable.

1 2 8

Ch. 4 Boundary Detection

Table4.1 INCREMENTATION INTHEGENERALIZEDHOUGH CASE

Angle measured from figure boundary toreferencepoint </>!

Set ofradii[ik] where r = (r,a) r/, t\ rj,

4>i

rlil ...,xl

if", r2m,..., r ThegeneralizedHoughalgorithmmaybedescribedasfollows:

Algorithm4.3: GeneralizedHough Step 0. Makeatable(likeTable4.1)fortheshapetobelocated. Step 1. Form an accumulator array of possible reference points A(xcmm: xcmax, ycmm :ycmax) initializedtozero. Step 2. Foreachedgepointdothefollowing: Step2.1. Compute0(x) Step2.2a. Calculatethepossiblecenters;thatis,for eachtableentry for <f>,compute xc:=x +r<f> cos[a(0)] yc :=y+r (f> sinia(</>)] Step2.2b. Incrementtheaccumulatorarray A(xc,yc) :=A(xc,yc) + 1 Step 3. PossiblelocationsfortheshapearegivenbymaximainarrayA.

The results of using thistransform todetectashapeareshownin Fig.4.9. Figure4.9ashowsan imageofshapes.TheRtablehasbeen madefor themiddle shape. Figure 4.9b shows the Hough transform for the shape, that is,A(xc, yc) displayed asanimage. Figure 4.9c shows the shape given by the maxima of

Sec. 4.3 The HoughMethod forCurveDetection

129

Fig. 4.9 Applying the Generalized Hough technique, (a) Synthetic image, (b) Hough Transform A(xc, yc) for middle shape, (c) Detected shape, (d) Same shape in an aerial imagesetting.

A(xc,yc) overlaid on top of the image. Finally, Fig. 4.9d shows the Hough transform usedtodetectapondofthesameshapeinanaerialimage. Whataboutthe parametersofscaleandrotation,Sand 01Thesearereadily accommodated byexpandingtheaccumulatorarrayanddoingmoreworkinthein crementationstep. Thusinstep1 theaccumulatorarrayischangedto
^ Xcmin *rmax .Vcmin J'emax "^min *^max> min "max > '

andstep2.2aischangedto

130

Ch. 4 Boundary Detection

foreachtableentryfor <f> do foreach S and9 xc := x+ r(<j))Scos[a($) + 9] yc :=y +r(0)Ssin [a(</>)+9] Finally,step2.2bisnow A(xc, yc, S,9) := A(xc>yc, S,9) +1
4.4 EDGEFOLLOWING ASGRAPHSEARCHING

Agraph isageneral object that consists of aset of nodes {,}and arcs between nodes <nh ,>.Inthissectionweconsidergraphswhosearcsmayhavenumeri calweightsorcostsassociatedwiththem.Thesearchfor theboundaryofanobject iscastasasearchforthelowestcostpathbetweentwonodesofaweightedgraph. Assume that agradient operator isappliedto thegraylevel image, creating themagnitudeimagesix) anddirection image<j>(x). Nowinterpret theelements ofthedirectionimage<f>(x) asnodesinagraph,eachwithaweightingfactors(x). Nodesxh Xjhavearcsbetweenthemifthecontourdirections<f>(x,),< (x,)areap > / propriatelyalignedwiththearcdirectedinthesamesenseasthecontourdirection. Figure4.10showstheinterpretation.TogenerateFig.4.10bimposethe following restrictions. Foranarctoconnectfrom x,tox,,Xjmustbeoneofthethreepossi > / bleeightneighborsinfront ofthecontourdirection< (x,)and,furthermore,g(x;)
\ / / \ \ \

1
\
\

\ /

/ \

1 \
\

/
\

I 1 \ I
\

Fig. 4.10 Interpretingagradientimageasagraph(seetext).

Sec. 4.4 Edge Following asGraph Searching

1 3 1

> T,g(xj) > T,where Tisachosenconstant,and|{[<(x,) <f>(xj)] mod2TT}\ < 1 it 2. (Anyoralloftheserestrictionsmaybemodified tosuittherequirementsofa particularproblem.) Togenerate apath inagraph from xA toxB onecanapply the wellknown technique ofheuristic search [Nilsson 1971,1980].The specific useof heuristic searchtofollowedgesinimageswasfirstproposedby[Martelli1972].Suppose: 1. Thatthepathshouldfollowcontoursthataredirectedfrom xA to xB 2. That wehave amethod for generating the successor nodes ofagiven node (suchastheheuristicdescribedabove) 3. Thatwehaveanevaluationfunction/(xj) whichisanestimateoftheoptimal costpathfrom xA toxB constrainedtogothroughx, Nilssonexpresses/ ( x , ) asthesumoftwocomponents:g(x,),theestimatedcost of ourneyingfrom thestartnodexA toxh andh(x,),theestimatedcostofthepath j fromx,toxB, thegoalnode. Withtheforegoingpreliminaries,theheuristicsearchalgorithm (calledtheA algorithmbyNilsson)canbestatedas:

Algorithm4.4: HeuristicSearch(theAAlgorithm) 1. "Expand" the start node (put the successors on a list called OPEN with pointersbacktothestartnode). 2. Removethenodex,ofminimum/from OPEN.Ifx, = xB, thenstop.Trace backthroughpointerstofindoptimalpath.IfOPENisempty,fail. 3. Elseexpandnodex,,puttingsuccessorsonOPENwithpointersbacktox,.Go tostep2.

Thecomponenth(x,)playsanimportantroleintheperformanceofthealgorithm; ifh(x,) = 0forall/,thealgorithmisaminimumcostsearchasopposedtoaheuristic search.Ifh(x,) > /J*(X,) (theactualoptimalcost),thealgorithm mayrun faster, but maymissthe minimumcost path. If h(x;) < /?*(x,),thesearch willalways produce a minimumcost path, provided that halso satisfies the following con sistencycondition: Ifforanytwonodesx,andXj,k(x,,x,)istheminimumcostofgettingfrom XjtoXj(ifpossible),then k(xh Xj) > hHxJ hHxj) Withouredgeelements, thereisnoguaranteethatapathcanbefound since there maybeinsurmountable gapsbetweenx^ andxB. Iffindingtheedgeiscru cial,stepsshouldbetakentointerpolateedgeelementspriortothesearch,orgaps maybecrossedbyusingtheedgeelementdefinitionof[Martelli 1972]. Hedefines
1 3 2 Ch. 4 Boundary Detection

edgesontheimagegridstructuresothatanedgecanhaveadirectioneventhough thereisnolocalgraylevelchange.ThisdefinitionisdepictedinFig.4.1la.
4.4.1 GoodEvaluationFunctions

Agoodevaluationfunction hascomponentsspecifictotheparticulartaskaswellas components thatare relatively taskindependent. The latter components aredis cussedhere. 1. Edgestrength. Ifedgestrength isafactor, thecostofaddingaparticular edge elementatxcanbeincludedas M six) where M max 5(x)
X

2. Curvature. Iflowcurvature boundaries aredesirable, curvature canbemeas uredassomemonotonicallyincreasingfunctionof difftyix:) <f>(Xj)] wherediffmeasurestheanglebetweentheedgeelementsatXjandx(. 3. Proximity toanapproximation. If an approximate boundary is known, boun dariesnearthisapproximationcanbefavoredbyadding:
d = dist (xhB)

tothecostmeasure.Thedistoperatormeasurestheminimumdistanceofthe newpointx;totheapproximateboundaryB. 4. Estimatesofthedistancetothegoal.Ifthecurveisreasonablylinear,pointsnear thegoalmay befavored byestimating hasd{xh xgoal),where dis adistance measure. Specific implementations of these measures appear in [Ashkar and Modestino 1978;Lesteretal.1978].
4.4.2 FindingAlltheBoundaries

Whatiftheobjectiveistofindallboundariesintheimageusingheuristicsearch? In one system [Ramer 1975] Hueckel's operator (Chapter 3) is used to obtain

7*K
(a) Sec.4.4 EdgeFollowing asGraph Searching

^
0

(b)

(c)

Fig. 4.11 Successorconventionsinheuristicsearch (seetext).


133

strokes,another name for the magnitude and direction of the local graylevel changes. Then thesestrokesarecombined byheuristicsearchtoform sequences ofedgeelementscalledstreaks.Streaksareanintermediateorganizationwhichare used to assure a slightly broader coherence than is provided by the individual Hueckeledges.Abidirectionalsearchisusedwithfoureightneighborsdefinedin frontoftheedgeandfoureightneighborsbehindtheedge,asshowninFig.4.1lb. Thesearchalgorithmisasfollows: 1. Scanthestroke(edge)arrayforthemostprominentedge. 2. Searchinfrontoftheedgeuntilnomoresuccessorsexist(i.e.,agapisencoun tered). 3. Searchbehindtheedgeuntilnomorepredecessorsexist. 4. Ifthe bidirectional searchgeneratesapathof3ormorestrokes,thepathisa streak.Storeitinastreaklistandgotostep1. Strokesthatarepartofastreakcannotbereused;theyaremarkedwhenused andsubsequentlyskipped. There areother heuristic procedures for pruning the streaks to retain only primestreaks.TheseareshowninFig.4.12. Theyareessentiallysimilartothere

s
/
S

i
\

4, </
* *

\ \
Fig. 4.12 Operations in thecreation ofprime streaks.

\ \

134

Ch.4 BoundaryDetection

(a)

Fig. 4.13 Ramer'sresults.

Iaxation operations described inSection 3.3.5.The resultant streaks must stillbe analyzed to determine the objects they represent. Nevertheless, this method representsacogentattempttoorganizebottomupedgefollowinginanimage.Fig. 4.13showsanexampleofRamer'stechnique.

Sec.4.4 EdgeFollowing asGraph Searching

1 3 5

4.4.3 AlternativestotheAAlgorithm The primary disadvantage with the heuristic search method isthat the algorithm must keeptrack ofasetofcurrent best paths (nodes),and thisset may become verylarge.Thesenodesrepresent tipnodesfor theportion ofthe treeofpossible pathsthathasbeenalready examined. Also,sinceallthecostsarenonnegative,a good path may eventually look expensive compared to tip nodes near the start node.Thus,pathsfrom thesenewernodeswillbeextendedbythealgorithmeven though, from apracticalstandpoint, theyareunlikely.Becauseofthesedisadvan tages,otherlessrigoroussearchprocedureshaveproventobemorepractical,five ofwhicharedescribedbelow. PruningtheTreeof lternatives A At various points in the algorithm the tip nodes on the OPEN list can be prunedinsomeway.Forexample,pathsthatareshortorhaveahighcostperunit length can be discriminated against. This pruning operation can be carried out wheneverthenumberofalternativetipnodesexceedssomebound. ModifiedDepthFirstSearch Depthfirst searchisameaningful conceptifthesearchspaceisstructuredas atree.Depthfirst searchmeansalwaysevaluatingthemostrecentexpanded son. Thistypeofsearchisperformed iftheOPEN listisstructured asastackintheA algorithmandthetopnodeisalwaysevaluated next.Modifications tothismethod use an evaluation function / torate the successor nodes and expand the bestof these.Practicalexamplescanbeseenin [BallardandSklansky 1976;Wechslerand Sklansky1977;Persoon1976]. LeastMaximumCost Inthiselegantidea [Lester1978],onlythemaximumcostarcofeachpathis keptasanestimateofg.Thisislikefindingamountainpassatminimumaltitude. The advantage isthatgdoes not build upcontinuously with depth in the search tree,sothatgoodpathsmaybefollowed foralongtime.Thistechniquehasbeen appliedtofindingtheboundariesofbloodcellsinopticalmicroscopeimages.Some resultsareshowninFig.4.14. BranchandBound Thecruxofthismethodistohavesomeupperboundonthecostofthepath [ChienandFu1974].Thismaybeknownbeforehand ormaybecomputedbyactu allygenerating apath between the desired end points.Also,theevaluation func tionmustbemonotonicallyincreasingwiththelengthofthepath. Withthesecon ditions we start generating paths, excluding partial paths when they exceed the currentbound. ModifiedHeuristicSearch Sometimes an evaluation function that assignsnegative costsleadstogood results. Thus good pathskeepgetting better with respect to the evaluation func tion, avoiding the problem of having to look at all paths near the starting point.
136

Ch. 4 Boundary Detection

(a)

(b)

Fig. 4.14 Using least maximum cost inheuristic search to find cell boundaries in micro scope images, (a)Astage inthesearch process, (b)Thecompleted boundary.

However, theprice paidisthesacrifice ofthe mathematical guaranteeof finding the leastcost path. This could be reflected in unsatisfactory boundaries. This method hasbeen used incineangiograms with satisfactory results [Ashkarand Modestino1978].
4.5 EDGEFOLLOWING ASDYNAMIC PROGRAMMING

4.5.1 Dynamic Programming

Dynamicprogramming [BellmanandDreyfus 1962]isatechniqueforsolvingop timization problems whennotallvariablesintheevaluation function areinterre latedsimultaneously.Considertheproblem maxhCxi,x2,x3,x4) (4.8)

Ifnothingisknownabout/?,theonlytechniquethatguaranteesaglobalmaximum is exhaustive enumeration ofallcombinations ofdiscrete values ofx\,... ,x4. Supposethat hi) =hi(xh x2) +h2(x2,x3) +/?3Gc3,x4) (4.9) xi onlydependsonXiinh\.Maximizeoverxiinhiandtabulatethebestvalueof hi(xj x2)foreachx2: fi (x2)=maxhi(xh x2)
x

(4.10)

Sincethevaluesofh2andA3donotdependo n x h theyneednotbeconsideredat
Sec.4.5 EdgeFollowing asDynamic Programming

137

thispoint.Continueinthismannerandeliminatex2 bycomputingf2 Gc3)as f2 (x3) =maxl/j Gc2)+h2{x2,x3)] and / 3 (x4) = max l/ 2 Gc3) + /?3U3,x 4 )] x
i

(4.11)

(4.12)

sothat finally max h = maxf3 (x4) Generalizingtheexampleto./Vvariables,where/o(*i) = 0, /_, (x)=max [f_2(jci)+ /?_1C^c_!,xn)] x \ maxh(xh . . . ,xN) = max/^, (*#)
*/
X

(4.13)

(4.14)

IfeachX;tookon20discretevalues,thentocomputefN (xN+i) onemustevaluate themaximand for 20different combinationsofxN andxN+\, sothatthe resultant computationaleffort involves (TV 1)202+ 20suchevaluations.Thisisastriking improvementoverexhaustiveevaluation,whichwouldinvolve20^evaluationsof hi Consider the artificial example summarized in Table 4.2. In this example, eachxcantakeononeofthreediscretevalues. Theh,arecompletelydescribedby their respectivetables.Forexample,thevalueof/?,(0, 1) = 5.Thesolutionsteps aresummarized inTable4.3.Instep1,foreachx2 thevalueofx\ thatmaximizes h\{x\, x2) iscomputed. Thisisthelargestentryineachofthecolumnsofh.Store thefunction valueasf\ (x2) andtheoptimizingvalueofx\ alsoasafunction of x2. In step2,addf\(x2) to h2(x2, x3). Thisisdone byaddingf\ toeachrowof h2, thuscomputingthequantity insidethe bracesof (4.11).Nowtocompletestep2, foreachx3, computethex2 thatmaximizes h2 +f\ byselectingthelargestentry ineachrowoftheappropriatetable.Therestofthestepsarestraightforward once theseareunderstood.Thesolutionisfound bytracingbackthroughthetables.For example,forx 4= 2weseethatthebestx 3is1,andthereforethebestx2 is3and x\ is1.Thisstepisdenotedbyarrows.
Table4.2 DEFINITION OF h
X X2 1 2 3

3
\ 1 2 3

*2 0 1 2
5

X
1 0 1

3 8 3

1 3 2

2 6 /),

1 3

1 5 h2

1 6

2 5 h3

3 4

6 1

1 3 8

Ch. 4 Boundary Detection

T a b l e 4.3 M E T H O D O F S O L U T I O N U S I N G D Y N A M I C P R O G R A M M I N G

x 2

u
6

* i

1 Step1 2

7 8

'vO

0
0 13 1 7 x2

\ * 3

\
1

1 7

\ \ \
\

Step2 2 3 8 8

00
1 2

0
\

13

14 10

\\ \

\ x
XJ

\\ \\ x, \
\
20 1

' \
1

000
16 17 20 15 14 11

Step3 0

(0
21 1

Step4:

Optimalx/s arefound byexamingtables (dashedlineshowstheorder inwhichthey arerecovered). h* =22 x * =1,x =3 , x ! = 1 , * 4 = 2

Solution:

4.5.2 Dynamic Programmingfor Images

To formulate the boundaryfollowing procedure as dynamic programming, one mustdefineanevaluationfunction thatembodiesanotionofthe"best boundary" [Montanari 1971;Ballard 1976]. Supposethatalocaledgedetectionoperatorisap
4.5 EdgeFollowing asDynamic Programming 1 3 9

pliedtoagraylevelpicturetoproduceedgemagnitudeanddirection information. Thenonepossiblecriterion fora"good boundary" isaweightedsumofhighcu mulative edge strength and low cumulative curvature; that is, for an ^segment curve,
n
1

/I(XJ,. . .,x)

= s ( x * ) + oc^q(xk,
*=1 k=\

xk+i)

(4.16)

wheretheimplicitconstraintisthatconsecutivexk's mustbegridneighbors: | | x , x , + 1 K V 2 q(xk> xk+l) diff[<f>(xk),<f>(xk+l)] (4.17) (4.18)

wherea isnegative.Thefunctiongwetaketobeedgestrength,i.e.,gOx)= s (x). Noticethatthisevaluationfunction isintheform of(4.9)andcanbeoptimizedin stages: / o ( x i ) = 0 / i (x2) = maxLs(xi) + aq(xh x2) + / 0 (x,)l x \ fk(xk+\) maxIs(x*) + aq{xk, xk+l) +/Jfc_1(xA)]
x

(4.19) (4.20) (4.21)

Theseequationscanbeputintothefollowingsteps:

Algorithm4.5: DynamicProgrammingforEdgeFinding 1. S e t / t 1 . 2. Consideronlyxsuchthat5(x) ^ T.Foreachofthesex,define lowcurvature pixels"infront of" thecontourdirection. 3. Eachofthesepixelsmayhaveacurveemanatingfromit. Fork= 1,thecurve isonepixelinlength.Join thecurvetoxthatoptimizesthelefthand sideof therecursionequation. 4. Ifk=N,pickthebestfN\ andstop.Otherwise,setk = k+1 andgotostep 2.

Thisalgorithmcanbegeneralizedtothecaseofpickingacurveemanatingfrom x (thatwehavealreadygenerated):Findtheendofthatcurve,andjointhebestof threecurvesemanatingfrom theendofthatcurve.Figure4.15showsthisprocess. Theequationsforthegeneralcaseare

140

Ch.4 BoundaryDetection

//

s \ s

Fig. 4.15 DPoptimizationforboundarytracing. / o (xj) = 0

// (xk+\) =max[s(xk) +aq(xk,


x

t(xk+l)) (4.22)

+ //i(x*)]

wherethecurvelengthnisrelatedtoa byabuildingsequencen(l) suchthatn(1) = 1, n(L) = N, and nil) n{l\) isamember of {n(k)\k = 1, ..., / 1}. Also, t(xk) isafunction that extracts the tail pixel of the curve headed by xk. Furtherdetailsmaybefoundin[Ballard1976]. Results from thearea oftumor detection inradiographsgiveasenseofthis method's performance. Here it isknown that the boundary inscribesanapproxi matelycirculartumor,sothatcircularcuescanbeusedtoassistthesearch.InFig. 4.16,(a)showstheimagecontainingthetumor, (b)showsthecues,and (c)shows theboundaryfoundbydynamicprogrammingoverlaidontheimage. Another application ofdynamicprogramming may befound inthepseudo parallelroadfinderofBarrow [Barrow1976].
4.5.3 LowerResolutionEvaluationFunctions

In the dynamicprogramming formulation just developed, thecomponents g(xk) and q(xk, xk+\) inthe evaluation function arevery localized; the variables x for successivesandqare infactconstrainedtobegridneighbors.Thisneednotbethe case:The xcan be very distant from each other without altering the basic tech nique.Furthermore, thefunctions gandqneednotbelocalgradient andabsolute curvature, respectively, but can be any functions defined on permissible x.This generalformulation oftheproblemforimageswasfirstdescribedby [Fischlerand

Sec.4.5 EdgeFollowing asDynamic Programming

1 4 1

Elschlager 1973].The Fischler and Elschlager formulation models an object asa set of parts and relations between parts, represented asagraph. Template func tions,denoted byg(x), measurehowwellapartofthemodelmatchesapartofthe imageatthepointx. (Theselocalfunctions maybedefined inanymannerwhatso ever.) "Relational functions," denoted byqkj (x,y),measure howwelltheposi tionofthematchofthe/cthpartat (x) agreeswiththepositionofthematchofthe y'thpartat (y). The basicnotions areshown byatechnique simplified from [Chien andFu 1974] to find the boundaries of lungs in chest films. The lung boundaries are modeled with a polygonal approximation defined by the five key points. These pointsare thetopof the lung, the twoclaviclelungjunctions, and the twolower corners.Tolocatethesepoints,localfunctions g(xk) aredefined whichshouldbe maximized when the corresponding point xk is correctly determined. Similarly, q(xk, Xj)isafunction relatingpointsxk and xy.Intheircase,Chienand Fuused thefollowing functions:
Ch. 4 Boundary Detection

T(\) = templatecenteredatxcomputedas anaggregateofasetofchestradiographs T(\ xk)f(x) VfV[T\ g(*k>=L


x I *I1/ I

and 9(xk, Xj)=expectedangularorientationofxk from x q(xk Xj) = Kxk, x,)arctan


Xk Xj

Withthisformulation nofurther modifications arenecessaryandthesolutionmay beobtained bysolvingEqs.(4.19)through (4.21),asbefore.Forpurposesofcom parison, this method wasformalized using alowerresolution objective function. Figure 4.17 shows Chien and Fu's results using this method with five template functions.
4.5.4 Theoretical Questionsabout Dynamic Programming

TheInteractionGraph Thisgraph describes theinterdependence ofvariablesintheobjective func tion.Intheexamplestheinteractiongraphwassimple:Eachvariabledependedon only twoothers,resulting in the graph ofFig.4.18a. Amore complicated caseis theonein4.18b,whichdescribesanobjectivefunction ofthefollowing form: h() = h\(x\, x2) +h2(x2, Xi,x4) +ht,Gc3 x4, x5, xe) Forthesecasesthedynamicprogrammingtechniquestillapplies,butthecomputa tional effort increases exponentially with the number of interdependencies. For example,toeliminatex2 inh2,allpossiblecombinationsofx3andx 4mustbecon sidered.ToeliminateX3 in A3,allpossiblecombinationsofx4, x5, andxe,andso forth. DynamicProgrammingversusHeuristicSearch Ithasbeen shown [Martelli 1976]thatforfindingapath inagraph between twopoints,whichisanabstraction oftheworkwearedoinghere,heuristicsearch methodscanbemoreefficient thandynamicprogrammingmethods.However,the point to remember about dynamic programming isthat itefficiently buildspaths from multiplestartingpoints.Ifthisisrequired byaparticular task,thendynamic programming would be the method of choice, unless a very powerful heuristic wereavailable.

4.6 CONTOUR FOLLOWING

Ifnothingisknownabouttheboundaryshape,butregionshavebeenfound inthe image, the boundary is recovered by one of the simplest edgefollowing opera tions:"blobfinding"inimages. Theideasareeasiesttopresentforbinaryimages:
Sec. 4.6 Contour Following 1 4 3

(a)

(b)

Fig. 4.17 Resultsofusinglocaltemplatesand global relations, (a)Model, (b)Results.

Givenabinaryimage,thegoalisfindtheboundariesofalldistinct regionsinthe image. This canbedone simply byaprocedure that functions like Papert's turtle [Papert1973;DudaandHart1973]: 1. Scantheimageuntilaregionpixelisencountered. 2. Ifitisaregionpixel,turnleftandstep;else,turnrightandstep. 3. Terminateuponreturntothestartingpixel. Figure 4.19showsthepath tracedoutbytheprocedure.Thisprocedure requires the region tobefourconnected for a consistent boundary. Parts ofan eight connected regioncanbemissed.Also,somebookkeepingisnecessarytogenerate anexactsequenceofboundarypixelswithoutduplications. Aslightly more elaborate algorithm dueto [Rosenfeld 1968] generatesthe boundary pixels exactly. It works byfirstfindingafourconnected background pixelfrom aknown boundary pixel. Thenext boundary pixelisthefirstpixelen countered when theeight neighbors areexamined inacounter clockwise order from thebackground pixel. Many details have tobeintroduced into algorithms that follow contours ofirregular eightconnected figures. Agood expositionof theseisgivenin[RosenfeldandKak1976].
4.6.1 ExtensiontoGrayLevel Images

The main ideabehindcontour following istostartwithapoint thatisbelievedto beontheboundary andtokeepextending theboundary byadding pointsin the contourdirections.Thedetailsoftheseoperationsvaryfrom tasktotask.Thegen
1 4 4 Ch.4 Boundary Detection

* 6

* 5

Fig. 4.18 InteractiongraphsforDP(seetext).

eralization ofthecontourfollower tograylevelimagesuseslocalgradientswitha magnitude s(x) and direction < (x) associated witheach pointx.0 pointsin the > / direction ofmaximum change. Ifxisontheboundary ofanimageobject, neigh boring pointson the boundary should bein thegeneral direction of the contour directions, <f>(x) ir/2, as shown by Fig. 4.20. A representative procedure is adaptedfrom [Martelli1976]: 1. Assumethatanedgehasbeendetected uptoapointx,. Movetothepointxy adjacent to x, in the direction perpendicular to the gradient ofx,. Apply the gradient operator tox/, ifitsmagnitudeisgreaterthan (some) threshold, this pointisaddedtotheedge. 2. Otherwise, compute theaveragegraylevelofthe 3x 3arraycentered onXj, compare itwithasuitably chosen threshold, and determine whether Xjisin sideoroutsidetheobject. 3. Makeanotherattemptwithapointxk adjacent tox,inthedirectionperpendic ulartothegradientatx,plusorminus (7r/4),accordingtotheoutcomeofthe previoustest.

JUi
Fig. 4.19 Findingtheboundaryina binaryimage.
Sec. 4.6 Contour Following 1 4 5

ll

l l l w
Localedge

Search space

Fig. 4.20 Angular orientations for contour following.

4.6.2 GeneralizationtoHigherDimensional ImageData

The generalization of contour following to higherdimensional spaces is straight forward [Liu 1977;Herman andLiu 1978].The search involved is,infact, slightly more complex than contour following and is more like the graph searching methodsdescribed inSection4.4.Higherdimensional imagespacesarisewhen the image hasmore than twospatialdimensions,istimevarying, orboth.Inthese im ages the notion of agradient isthe same (avector describing the maximum gray level change and its corresponding direction), but the intuitive interpretation of the corresponding edge element may be difficult. In three dimensions, edge ele ments are primitive surface elements, separating volumes of differing gray level. Theobjective ofcontourfollowing istolinktogetherneighboringsurface elements with high gradient modulus values and similarorientations into larger boundaries. In four dimensions, "edge elements" are primitive volumes; contour following linksneighboring volumeswithsimilargradients. The contour following approach workswellwhen there islittlenoise present and no "spurious" boundaries. Unfortunately, if either of these conditions is present, the contourfollowing algorithms are generally unsatisfactory; they are easily thwarted by gaps in the data produced by noise, and readily follow spurious boundaries. The methods described earlier in this chapter attempt to overcome thesedifficulties through moreelaboratemodelsoftheboundary structure. EXERCISES 4.1 Specify aheuristicsearchalgorithmthatwillworkwith"crack"edgessuchasthosein Fig.3.12. 4.2 Describeamodification ofAlgorithm4.2todetectparabolaeingraylevelimages. 4.3 Suppose that arelation h(x\, X() isadded to the model described byFig.4.18a so thatnowtheinteractiongraphiscyclical.Showformally howthischangestheoptimi zationstepsdescribedbyEqs.(4.11)through (4.13). 4.4 Show formally that the Hough technique without gradient direction information is equivalenttotemplatematching (Chapter3).
146
Ch. 4 Boundary Detection

4.5 Extend theHough technique forellipses described byEqs. (4.7a) through (4.7d) to ellipsesorientedatanarbitrary angle0 tothe xaxis. 4.6 Showhowtousethegeneralized Hough techniquetodetect hexagons.

REFERENCES ASHKAR,G.P.andJ.W.MODESTINO."Thecontourextractionproblemwithbiomedicalapplications.'" CGIP7,1978,331355. BALLARD, D.H.Hierarchicdetectionoftumors inchestradiographs. Basel:BirkhauserVerlag (ISR16), January1976. BALLARD, D.H."Generalizing theHough transform todetectarbitrary shapes." PatternRecognition 13, 2,1981,111122. BALLARD, D.H.andJ. SKLANSKY."Aladderstructured decisiontreeforrecognizingtumorsinchest radiographs."IEEETrans. Computers25,1976,503513.
BALLARD, D. H., M. MARINUCCI, F. PROIETTIORLANDI, A. ROSSIMARI, and L. TENTARI. "Automatic

analysisofhumanhaemoglobin fingerprints." Proc,3rdMeeting,InternationalSocietyofHae motology,London,August1975. BARROW, H.G."Interactive aidsforcartography andphoto interpretation." SemiAnnual Technical Report,AICenter,SRIInternational,December1976. BELLMAN, R.andS. DREYFUS. Applied Dynamic Programming. Princeton, NJ:Princeton University Press,1962. BOLLES,R."Verification visionforprogrammableassembly."Proc,5thIJCAI,August 1977,569575. CHIEN, Y.P.andK.S.Fu."Adecision function method forboundary detection." CGIP3,2,June 1974,125140. DUDA, R.O.andP.E.HART. "Useofthe Hough transformation todetect linesandcurves inpic tures."Commun.ACM15, 1,January1972,1115. DUDA,R.0.andP.E.HART.PatternRecognitionandSceneAnalysis.NewYork:Wiley,1973. FISCHLER, M.A.andR.A.ELSCHLAGER."Therepresentationandmatchingofpictoralpatterns."IEEE Trans. Computers22, January1973. HERMAN,G.T.andH.K.Liu."Dynamicboundarysurfacedetection."CGIP7,1978,130138. HOUGH,P.V.C."Method andmeansforrecognizingcomplexpatterns."U.S.Patent3,069,654;1962. KELLY,M.D."Edgedetectionbycomputerusingplanning."InMI6,1971. KIMME, C ,D.BALLARD,andJ.SKLANSKY. "Findingcirclesbyanarrayofaccumulators."Commun. ACM/S, 2,1975,120122. LANTZ, K.A.,C.M. BROWNandD.H.BALLARD. "Modeldriven vision usingproceduredecription: motivationandapplicationtophotointerpretation and medicaldiagnosis."Proc,22nd Interna tional Symp., Society of Photooptical Instrumentation Engineers, SanDiego, CA,August 1978.
LESTER, J. M., H. A. WILLIAMS, B. A. WEINTRAUB, and J. F. BRENNER, "TWO graph searching tech

niquesforboundaryfindinginwhite blood cellimages." Computers inBiology and Medicine8, 1978,293308. Liu,H.K."TWOandthreedimensional boundarydetection."CGIP6,2,April1977,123134. MARR, D."Analyzing natural images;acomputational theory oftexture vision." Technical Report 334, AILab,MIT,June1975. MARTELLI,A."Edgedetectionusingheuristicsearchmethods."CGIP1,2,August1972,169182. MARTELLI,A."Anapplicationofheuristicsearch methodstoedgeandcontour detection."Commun. ACM 19,2,February 1976,7383.
References

147

MONTANARI,U. "On theoptimal detection ofcurvesin noisy pictures." Commun. ACM 14, 5,May 1971,335345. NILSSON,N.J.ProblemSolvingMethodsinArtificialIntelligence.NewYork:McGrawHill,1971. NILSSON,N.J.Principlesof rtificialIntelligence. PaloAlto,CA: Tioga,1980. A PAPERT,S."Usesoftechnologytoenhanceeducation."TechnicalReport298,AILab,MIT,1973. PERSOON, E."Anewedgedetectionalgorithm anditsapplications inpicture processing."CGIP5,4, December 1976,425446. RAMER,U."Extraction oflinestructuresfrom photographsofcurvedobjects." CGIP4, 2,June1975, 81103. ROSENFELD,A.PictureProcessingbyComputer.NewYork:AcademicPress,1968. ROSENFELD,A.andA.C.KAK.DigitalPictureProcessing.NewYork:AcademicPress,1976.
SELFRIDGE, P.G., J. M.S. PREWITT,C.R. DYER, andS. RANADE. "Segmentation algorithmsforab

dominalcomputerized tomographyscans."Proc,3rdCOMPSAC,November 1979,571577. WECHSLER, H.and J.SKLANSKY. "Finding the ribcageinchest radiographs." Pattern Recognition 9, 1977,2130.

148

Ch.4 Boundary Detection

Region Growing
5.1 REGIONS

Chapter4concentrated onthelinearfeatures (discontinuitiesofimagegraylevel) that often correspond to object boundaries, interesting surface detail,and soon. The "dual" problem tofindingedges around regionsofdiffering gray level isto findtheregionsthemselves.Thegoalofregiongrowingistouseimagecharacteris ticstomapindividual pixelsinaninput imagetosetsofpixelscalled regions. An imageregionmightcorrespondtoaworldobjectorameaningful partofone. Ofcourse, verysimpleprocedures willderiveaboundary from aconnected region ofpixels,andconversely canfillaboundary toobtain aregion.Thereare severalreasonswhybothregiongrowingandlinefindingsurviveasbasicsegmen tation techniques despite their redundantseeming nature. Although perfect re gions and boundaries are interconvertible, the processing to find them initially differs in character and applicability; besides, perfect edgesorregionsare notal waysrequired for an application. Regionfinding and linefinding techniques can cooperatetoproduceamorereliablesegmentation. Thegeometriccharacteristicsofregionsdependonthedomain.Usually,they are considered to be connected twodimensional areas.Whether regions can be disconnected, nonsimply connected (have holes), should have smooth boun daries,andsoforth dependsontheregiongrowingtechniqueandthegoalsofthe work. Ultimately, it isoften the segmentation goal to partition the entire image intoquasidisjointregions.Thatis,regionshavenotwodimensionaloverlaps,and nopixelbelongstotheinteriorofmorethanoneregion.However, thereisnosin gledefinition ofregiontheymaybeallowedtooverlap,thewholeimagemaynot bepartitioned,andsoforth. Our discussion ofregion growerswillbegin withthe mostsimple kindsand progresstothemorecomplex.Themostprimitiveregiongrowersuseonlyaggre gatesofproperties of localgroups ofpixelstodetermine regions. Moresophisti
149

catedtechniques"grow" regionsbymergingmoreprimitiveregions.Todothisin a structured wayrequires sophisticated representations ofthe regions and boun daries.Also,themergingdecisionscanbecomplex,andcandependondescriptions oftheboundarystructureseparatingregionsinadditiontotheregionsemantics.A goodsurveyofearlytechniquesis[Zucker1976]. Thetechniquesweconsiderare: 1. Localtechniques.Pixelsareplacedinaregiononthebasisoftheirpropertiesor thepropertiesoftheircloseneighbors. 2. Globaltechniques.Pixelsaregroupedintoregionsonthebasisoftheproperties oflargenumbersofpixelsdistributedthroughouttheimage. 3. Splittingandmergingtechniques.Theforegoing techniquesarerelatedtoindivi dualpixelsorsetsofpixels.Statespacetechniquesmergeorsplitregionsusing graphstructurestorepresenttheregionsandboundaries.Bothlocalandglobal mergingandsplittingcriteriacanbeused. Theeffectiveness ofregiongrowingalgorithmsdependsheavilyontheappli cationareaandinputimage.Iftheimageissufficiently simple,sayadarkblobona light background, simplelocaltechniquescanbesurprisingly effective. However, onverydifficult scenes,suchasoutdoorscenes,eventhemostsophisticatedtech niques still may not produce a satisfactory segmentation. In this event, region growing is sometimes used conservatively to preprocess the image for more knowledgeableprocesses[HansonandRiseman1978]. Indiscussingthespecificalgorithms,thefollowingdefinitionswillbehelpful. RegionsRk areconsideredtobesetsofpointswiththefollowingproperties: x,inaregionRisconnectedtoXjiffthere isasequence{x,,...,xy}suchthatx*andx^+i areconnectedandallthepointsarein R. Risa connectedregion ifthesetofpointsxinRhasthe propertythateverypairofpointsisconnected.
m

(5.1) (5.2)

/,theentireimage ( J Rk R,r\Rj =<l>t i*j

(5.3) (5.4)

Asetofregionssatisfying (5.2) through (5.4) isknownasapartition. Inseg mentation algorithms, each region often isaunique, homogeneous area.Thatis, forsomeBooleanfunction H(R) thatmeasuresregionhomogeneity, H(Rk) = trueforall k H(Ri U Rj) =falsefor i*j (5.5) (5.6)

Note that R, doesnot have tobeconnected. Aweaker butstilluseful criterion is thatneighboringregionsnotbehomogeneous.


150 Ch.5 Region Crowing

5.2 ALOCALTECHNIQUE:BLOBCOLORING

The counterpart to the edge tracker for binary images is the blobcoloring algo rithm. Given a binary image containing fourconnected blobs of l's on a back ground of O's, the objective isto "color each blob"; that is, assign each bloba different label.Todothis,scantheimagefromlefttorightandtoptobottomwith aspecialLshapedtemplateshowninFig.5.1.Thecoloringalgorithmisasfollows. Algorithm5.1: BlobColoring Lettheinitialcolor,k=1. Scantheimagefrom lefttorightandtoptobottom. I f / ( x c ) = 0thencontinue else begin if(fUu) = l a n d / ( x L ) = 0 ) thencolor (x c ) := color (x^) if(/"(xL) = land/(x</) = 0 ) thencolor(x c ) := color (xL) if(/(x L ) = l a n d / ( X f / ) = 1) thenbegin color(x c ) := color (xL) color(xL) isequivalenttocolor (xy) end comment:twocolorsareequivalent. if(f(xL) = 0 a n d / ( x y ) = 0) thencolor(xL) := A,k:= k+l r comment:newcolor end After onecompletescanoftheimagethecolorequivalencescanbeusedtoassure thateachobject hasonlyonecolor.Thisbinary imagealgorithm canbeusedasa simpleregiongrowerforgraylevelimageswiththefollowingmodifications. Ifina

*L

Fig. 5.1 Lshaped templateforblob coloring.


151

Sec. 5.2 ALocalTechnique: BlobColoring

graylevel i m a g e / ( x c ) isapproximately equal t o / ( x ^ ) , assign x c to thesamere gion (blob) as xy .This isequivalent to the c o n d i t i o n / ( x c ) = / ( x y ) = 1in Al gorithm 5.1. The modifications tothestepsinthealgorithm are straightforward.

5.3 GLOBALTECHNIQUES:REGIONGROWING VIA THRESHOLDING

This approach assumes an objectbackground image and picks a threshold that dividestheimagepixelsintoeither objector background: xispartoftheObject iff/ (x) > T Otherwiseitispartofthe Background The bestwaytopick thethreshold T\s tosearch thehistogram ofgray levels, assuming itis bimodal, and find the minimum separating the two peaks, as in Fig. 5.2.Finding therightvalley between thepeaksofahistogram can bedifficult when the histogram is not a smooth function. Smoothing the histogram can help but doesnotguarantee thatthecorrect minimum can befound. Anelegant method for treating bimodal images assumes that the histogram is the sum of two composite normal functions and determines the valley location from the normal parameters [ChowandKaneko 1972], Thesinglethreshold method isuseful insimplesituations, butprimitive. For example, the region pixels may not be connected, and further processing such as thatdescribed in Chapter 2may benecessary tosmooth region boundariesand re move noise. Acommon problem with this technique occurswhen the image hasa background ofvarying graylevel,orwhen collections wewould liketocall regions vary smoothly in gray level by more than the threshold. Two modifications of the threshold approach toameliorate thedifficulty are: (1) highpass filter theimage to deemphasize the lowfrequency background variation and then try the original technique; and (2) use aspatially varying threshold method such as thatof [Chow andKaneko 1972]. The ChowKaneko technique divides the image up into rectangular subim ages and computes a threshold for each subimage. A subimage can fail to have a threshold ifitsgraylevel histogram isnot bimodal. Such subimages receive inter
Number^ of pixels

Gray level Threshold

Fig. 5.2 Threshold determination from graylevelhistogram.


Ch. 5 Region Crowing

152

polated thresholds from neighboring subimages that are bimodal, andfinallythe entirepictureisthresholdedbyusingtheseparatethresholdsforeachsubimage.
5.3.1 ThresholdinginMultidimensionalSpace

Aninteresting variation tothebasicthresholdingparadigmusescolorimages;the basic digital picture function is vectorvalued with red, blue, and green com ponents.Thisvector isaugmented withpossiblynonlinear combinationsofthese values so that the augmented picture vector has anumber of components. The ideaistorerepresent thecolorsolidredundantly and hopetofindcolor parame tersforwhichthresholdingdoesthedesiredsegmentation.Oneimplementationof thisideausedthered,green,andbluecolorcomponents;theintensity,saturation, andhuecomponents;andtheN.T.S.C. K,/,Qcomponents (Chapter2) [Ohlander etal.1979]. Theideaofthresholdingthecomponentsofapicturevectorisusedinaprim itiveformformultispectralLANDSATimagery [Robertsonetal.1973].Thenovel extension inthisalgorithmistherecursiveapplicationofthistechniquetononrec tangularsubregions. Theregionpartitioningisthenasfollows:

Algorithm5.2: RegionGrowingviaRecursiveSplitting 1. Consider theentire imageasaregionandcomputehistogramsforeachofthe picturevectorcomponents. 2. Applyapeakfinding testtoeachhistogram. Ifatleastonecomponent passes thetest,pickthecomponentwiththemostsignificant peakanddeterminetwo thresholds, one either side of the peak (Fig. 5.3). Use these thresholds to dividetheregionintosubregions. 3. Eachsubregionmayhavea"noisy"boundary,sothebinaryrepresentationof the image achieved by thresholding is smoothed so that only asingle con nectedsubregionremains.Forbinarysmoothingseech.8and [Rosenfeld and Kakl976]. 4. Repeat steps 1through 3 for each subregion until no new subregions are created (nohistogramshavesignificantpeaks). A refinement of step 2 of this scheme is to create histograms in higher dimensionalspace [HansonandRiseman 1978]. Multipleregionsareoften inthe same histogram peak when asingle measurement is used.The advantage of the multimeasurement histograms isthat these different regions areoften separated into individual peaks,andhence thesegmentation isimproved. Figure5.4shows someresultsusingathreedimensionalRGBcolorspace. Thefigureshowstheclearseparationofpeaksinthethreedimensionalhisto gram that isnot evident ineither oftheonedimensional histograms. Howmany
Sec.5.3 Global Techniques: Region Crowing via Thresholding 1 5 3

27SREDS 231

0 SGRFEN <222

44<BLUES 231

80

160 20

27<INTENSITY<228

50 150 250 350 OSHUE < 359

50 150 250 4 < SATURATIONS255

6 0 120 180 240 15 <Y < 2 2 6

250 300 350 243S I S 358

200 240 280 320 360 219 SOS 340

(b)

Fig. 5.3 Peak detection and threshold determination, (a) Original image, (b) Histograms, (c) Image segments resulting from first histogram peak.
1 5 4 Ch. 5 Region Crowing

Fig. 5.3 (d)Finalsegments.

dimensions should beused? Obviously, there isatradeoff here:Asthe dimen sionality becomes larger, the discrimination improves, but the histograms are moreexpensivetocomputeandnoiseeffectsmaybemorepronounced.
5.3.2 Hierarchical Refinement

This technique usesapyramidal image representation (Section 3.7) [Harlowand Eisenbeis 1973].Regiongrowingisappliedtoacoarseresolution image.Whenthe algorithm hasterminatedatoneresolution level,thepixelsneartheboundariesof regionsaredisassociatedwiththeirregions.Theregiongrowingprocessisthenre peatedforjustthesepixelsatahigherresolution level.Figure5.5showsthisstruc ture.

5.4 SPLITTINGAND MERGING

GivenasetofregionsRk, k= 1,...,m,alowlevelsegmentationmightrequirethe basic properties described in Section 5.1 to hold. The important properties from thestandpointofsegmentationareEqs.(5.5)and(5.6). IfEq. (5.5) isnotsatisfied for some k, itmeans that thatregionisinhomo geneousandshouldbesplitintosubregions.IfEq. (5.6) isnotsatisfied forsome/ andy,thenregions/andy'arecollectivelyhomogeneousandshouldbemergedinto asingleregion. Inourpreviousdiscussionsweused true H(R) = false and true H(R) = false
Sec. 5.4 Splitting and Merging

ifallneighboringpairsofpoints in R aresuchthatfix) /(y) < T otherwise

(5.7)

ifthepointsin R passa bimodalityorpeaktest otherwise

(5.8)

1 5 5

Fig. 5.4 Multidimensional histogramsinsegmentation, (a)Image. (b)RGBhistogramshowingsuccessive planesthrougha16x 16x 16color space,(c)Segments.(Seecolorinserts.)
156

(c) Ch.5 Region Crowing

Fig. 5.5 Hierarchicalregion refinement.

Awayofworkingtowardthesatisfaction ofthesehomogeneitycriteriaisthe splitandmergealgorithm [HorowitzandPavlidis1974].Tousethealgorithm itis necessarytoorganizetheimagepixelsintoapyramidalgridstructureofregions.In thisgrid structure, regions are organized intogroups offour. Any region can be splitintofoursubregions(exceptaregionconsistingofonlyonepixel),andtheap propriategroupsoffour canbemergedintoasinglelargerregion.Thisstructureis incorporatedintothefollowingregiongrowingalgorithm. Algorithm 5.3: Region Growing via Split and Merge [Horowitz and Pavlidis 1974] 1. Pick anygrid structure, and homogeneity property H.Iffor any region R in thatstructure,H(R) =false,splitthatregionintofour subregions. Ifforany fourappropriateregionsRki ,..., Rk4, H(RkX [J Rk2 U Rk3 U ^ 4 ) = true > merge them into a single region. When no regions can be further split or merged,stop. 2. IfthereareanyneighboringregionsR,andRj (perhapsofdifferentsizes)such thatH(Rj \J Rj) =true,mergetheseregions.

5.4.1 StateSpaceApproachtoRegionGrowing

The "classical"statespaceapproachofartificial intelligence [Nilsson 1971,1980] wasfirstapplied toregiongrowingin [Briceand Fennema 1970]and significantly extended in [Feldman and Yakimovsky 1974].This approach regards the initial twodimensional imageasadiscretestate,whereevery samplepoint isaseparate region.Changesofstateoccurwhenaboundarybetweenregionsiseitherremoved orinserted.Theproblemthenbecomesoneofsearchingallowablechangesinstate tofindthebestpartition.
Sec.5.4 Splitting and Merging

157

+ + 0 + + 0 + + 0

+ + 0 + + 0 + + 0

+ + 0 + + 0 + + 0

+ + 0 + + 0 + + 0

Unassigned

+ Edgedata 0 Greyleveldata

Fig. 5.6 Grid structure forregion representation [BriceandFennema 1970].

Animportantpartofthestatespaceapproachistheuseofdatastructuresto allowregionsandboundaries tobemanipulated asunits.Thismovesaway from earliertechniques,whichlabeledeachindividualpixelaccordingtoitsregion.The highleveldatastructuresdoawaywiththisexpensivepracticebyrepresentingre gionswiththeirboundariesandthenkeepingtrackofwhathappenstotheseboun dariesduringsplitandmergeoperations.


5.4.2 Lowlevel Boundary Data Structures

Auseful representation forboundariesallowsthesplittingandmergingofregions toproceedinasimplemanner [BriceandFennema 1970].Thisrepresentationin troducesthenotionofasupergridStotheimagegridG.Thesegridsareshownin Fig. 5.6,where and + correspond to supergrid and Oto the subgrid.The representationisassumedtobefourconnected (i.e.,xlisaneighborofx2if||xl x 2 | | < l ) . With thisnotation boundariesofregionsaredirected crack edges (seeSec. 3.1) atthepointsmarked +.Thatis,ifpoint x*isaneighborofx7andxk isina different regionthanxy,inserttwoedgesfortheboundariesoftheregionscontain yandxk atthepoint 4separatingthem,suchthateachedgetraversesitsas ingX sociated region inacounterclockwise sense. This makes merge operations very simple:TomergeregionsRkandRh removeedgesoftheoppositesensefromthe boundaryasshowninFig.5.7a.Similarly,tosplitaregionalongaline,insertedges oftheoppositesenseinnearbypoints,asshowninFig.5.7b. Themethodof[BriceandFennema 1970]usesthreecriteriaformergingre gions, reflecting atransition from local measurements toglobal measurements. ThesecriteriausemeasuresofboundarystrengthSyand Wydefinedas s,7=| / < x , ) / ( x , ) |
wk =

(5.9)

1 0

ifsk < T} otherwise

= >
ft,

(a) Fig. 5.7 Region operationsonthegrid structureofFig.5.6.


1 5 8 Ch. 5 Region Growing

(b)

Fig. 5.7

(cont.)

wherex,andx,areassumed tobeoneithersideofacrackedge (Chapter3).The threecriteriaareappliedsequentiallyinthefollowingalgorithm: Algorithm 5.4: Region Growing via Boundary Melting (Tk, k = 1, 2, 3 are presetthresholds) 1. Forallneighboringpairsofpoints,removetheboundary betweenx,and x,if /5^7and Wy = 1.Whennomoreboundariescanberemoved,gotostep2. 2. Removetheboundarybetween/?,andRjif rT>T2 (5.11) mm[ph Pj\ where W'\sthesum ofthe Wy onthecommon boundary between Rjand RJt thathaveperimeters/?,andpjrespectively. Whennomoreboundariescanbe removed,gotostep3. 3. RemovetheboundarybetweenRtandRjif W > T3 (5.12)

5.4.3 GraphOriented RegionStructures The BriceFennema data structure stores boundariesexplicitly butdoes notpro videfor explicit representation ofregions.Thisisadrawback when regions must be referred to as units. An adjunct scheme of region representation can be developed using graph theory. This scheme represents both regions and their boundariesexplicitly,andthisfacilitatesthestoringandindexingoftheirsemantic properties. Theschemeisbasedonaspecialgraphcalledtheregionadjacencygraph,and its "dual graph." In the region adjacency graph, nodesare regionsand arcsexist betweenneighboringregions.Thisschemeisusefulasawayofkeepingtrackofre gions,evenwhentheyareinscribedonarbitrarynonplanarsurfaces (Chapter9).
Splitting and Merging 159

Consider the regions ofan imageshown in Fig.5.8a. The region adjacency graph hasanode ineach region and anarccrossing each separate boundary seg ment.Toallowauniform treatmentofthesestructures,defineanartificial region thatsurroundstheimage.ThisnodeisshowninFig.5.8b.Forregionsonaplane, the regionadjacency graph isplanar(can liein aplanewith noarcs intersecting) anditsedgesareundirected.The "dual" ofthisgraph isalsoofinterest.Tocon stuctthe dualof theadjacency graph, simply placenodesineachseparateregion andconnectthemwitharcswherevertheregionsareseparatedbyanarcinthead jacencygraph.Figure5.8cshowsthatthedualoftheregionadjacency graphislike the original region boundary map;inFig.5.8b each arcmaybeassociated witha specific boundary segment and each nodewithajunction between threeor more boundarysegments.Bymaintaining boththeregionadjacency graphanditsdual, onecanmergeregionsusingthefollowingalgorithm: Algorithm5.5: MergingUsingtheRegionAdjacency GraphandItsDual Task:MergeneighboringregionsR,and Rj. Phase1.Updatetheregionadjacency graph. 1. Place edges between Rj and all neighboring regions of Rj (excluding, of course,Rj) thatdonotalreadyhaveedgesbetweenthemselvesand Rj. 2. DeleteRjandallitsassociatededges. Phase2.Takecareofthedual. 1. DeletetheedgesinthedualcorrespondingtothebordersbetweenRjand Rj. 2. Foreachofthenodesassociatedwiththeseedges: (a) iftheresultantdegreeofthenodeislessthanorequalto2,deletethe nodeandjointhetwodanglingedgesintoasingleedge. (b) otherwise, updatethelabelsoftheedgesthatwereassociated with j toreflectthenewregionlabel/.

Figure5.9showstheseoperations.
5.5 INCORPORATION OFSEMANTICS

Uptothispoint inour treatment ofregion growers,domaindependent "seman tics" has not explicitly appeared. In other words, regionmerging decisions were based on rawimagedata and rather weak heuristicsofgeneral applicability about the likely shape of boundaries. As in early processing, the use of domain dependentknowledgecanaffect regionfinding.Possibleinterpretationsofregions canaffect thesplittingandmergingprocess.Forexample,inanoutdoorscenepos sibleregion interpretations might besky,grass,orcar.Thiskind ofknowledgeis quiteseparate from but related tomeasurable region properties such asintensity
1 6 0 Ch. 5 Region Crowing

Fig. 5.8 (a) An image partition, (b) The region adjacency graph (solid lines). (c)The dualoftheadjacency graph (solid lines).

andhue.Anexampleshowshowsemanticlabelsforregionscanguidethemerging process. This approach was originally developed in [Feldman and Yakimovsky 1974]. it has found application in several complex vision systems [Barrow and Tenenbaum 1977;HansonandRiseman1978]. Earlysteps in the FeldmanYakimovsky region grower used essentially the same stepsasBriceFennema. Once regions attain significant size, semanticcri

Fig. 5.9 Merging operations using the region adjacency graph and its dual, (a) Before merging regionsseparated bydark boundary line, (b) After merging.
Sec.5.5 Incorporation of Semantics 1 6 1

teriaareused.Theregiongrowingconsistsoffour steps,assummedupinthefol lowingalgorithm: Algorithm5.6 SemanticRegionGrowing NonsemanticCriteria T\and T2arepreset thresholds 1. Mergeregions/,j aslongastheyhaveoneweakseparatingedgeuntilnotwo regionspassthistest. 2. Mergeregions/,jwhereS(i, j) < T2where
S(i,j) =
C\ + OLjj C2 + Otfj

whereC\andc2areconstants, (area,)'/2+ (area.)1'2 au = perimeter, perimeter, untilnotworegionspassthistest. (ThisisasimilarcriteriontoAlgorithm5.4, step2.) SemanticCriteria 3. LetBjjbetheboundarybetweenR, andRj. Evaluateeach BywithaBayesian decisionfunction thatmeasuresthe (conditional) probabilitythatBu separates tworegionsR,andRjofthesameinterpretation.Merge/?,andRjifthiscondi tional probability islessthan some threshold. Repeat step3until no regions passthethresholdtest. 4. Evaluatetheinterpretation ofeachregionRjwithaBayesiandecision function thatmeasuresthe(conditional) probabilitythataninterpretation isthecorrect one for that region. Assign the interpretation to the region with the highest confidence ofcorrect interpretation. Update the conditional probabilities for different interpretations ofneighbors. Repeat the entire process until allre gionshaveinterpretationassignments. Thesemanticportionofalgorithm5.6hadthegoalofmaximizinganevalua tion function measuring the probability ofacorrect interpretation (labeled parti tion),giventhemeasurements ontheboundariesandregionsofthepartition.An expressionfortheevaluationfunction is(foragivenpartitionandinterpretationsX

andY):
max II [P[Bn isaboundarybetween X and YImeasurementson Bn]}
X, Y i,j
J

'

x n {P[Rj isan X |measurementson /?,]} x n [P[Rj isan Y\measurementson Rj]}


1 6 2 Ch. 5 Region Crowing

wherePstandsforprobabilityandITistheproduct operator. Howare these terms to becomputed? Ideally, each conditional probability function should beknown toareasonabledegreeofaccuracy;then thetermscan beobtainedbylookup. However, the straightforward computation and representation ofthecondi tional probability functions requires amassive amount of work and storage. An approximation used in [Feldman and Yakimovsky 1974] isto quantize the mea surements and represent them in terms ofaclassification tree. The conditional probabilitiescanthenbecomputed from dataattheleavesofthetree.Figure5.10 showsahypothetical treefor the region measurements ofintensity and hue,and interpretations ROAD,SKY,andCAR.Figure5.11showstheequivalenttreefor two boundary measurements mand nand the same interpretations. These two figures indicatethatPL/?,isaCAR|0< i< /,0< h<Hx] = ,andP[BUdivides twocarregions\Mk < m< Mk+], Nt < n^ N/+] = . These treeswerecreated bylaborioustrialswithcorrectsegmentationsoftestimages. Now,finally,consider again step3ofAlgorithm 5.6.The probability thata boundaryBybetweenregionsR,andRjisfalseisgivenby

where Pj =^{P[B,j isbetweentwosubregions X 'By*smeasurements]} x{P[Ri isX\meas]}x{P[Rj isX|meas]} P,= {PiBtjisbetween X and Y\meas]}
x.y

(5.14a)

(5.14b)

x {P[Rj isJ |meas]}x{P[Rj is r|meas]}

Fig. 5.10 Hypothetical classification tree for region measurements showing a particular branch for specific rangesofintensity and hue.
Sec. 5.5 Incorporation of Semantics

Fig. 5.11 Hypotheticalclassification treeforboundarymeasurements showingaspecificbranchforspecific rangesoftwomeasurements/wandn. And forstep4ofthe algorithm, Confidence, = P[R, is XI |meas] P[Ri is X2|meas] (5.15)

where X\, XI are the first and second most likely interpretations, respectively. After theregionisassigned interpretation XI, theneighborsareupdated using P[R, isX |meas]\=Prob [Rj isX |meas] x P[Bjj isbetween X and XI Imeas] EXERCISES 5.1 InAlgorithm5.1,showhowonecanhandlethecasewherecolorsareequivalent.Do youneedmorethanonepassovertheimage? 5.2 ShowfortheheuristicofEq.(5.11)that (a) IT2 > WT2 > Pj (5.16)

(b) Pm < P, + / ( l / : r 2 2 )
where Pm isthe perimeter ofRt \J Rj, I isthe perimeter common to both iand j andPm = min{P, Pj). Whatdoespart (b)implyabouttherelationbetween T2and
p 9
* m

5.3 Write a "histogrampeak" finder; that is, detect satisfying valleys in histograms separatingintuitivehillsorpeaks. 5.4 Suppose thatregionsarerepresented byaneighborliststructure.Eachregion hasan associated list of neighboring regions. Design a regionmerging algorithm based on thisstructure. 5.5 Whydojunctionsofregionsinsegmentedimagestendtobetrihedral? 5.6 Regions, boundaries, andjunctions are the structures behind the regionadjacency graphanditsdual.Generalizethesestructurestothreedimensions.Isanotherstruc tureneeded? 5.7 GeneralizethegraphofFigure5.8tothreedimensionsanddevelopthemergingalgo rithmanalogoustoAlgorithm5.5. (Hint:seeExercise5.6.)
164 Ch.5 Region Growing

R E F E R E N C E S BARROW, H. G. and J. M. TENENBAUM. "Experiments in modeldriven scene segmentation." Artificial Intelligences, 3,June 1977,241274. BRICE, C. and C. FENNEMA. "Scene analysis using regions." Artificial Intelligence /, 3, Fall 1970, 205226. CHOW, C. K. and T. KANEKO. "Automatic boundary detection of the left ventricle from cinean giograms." ComputersandBiomedicalResearch 5,4, August 1972,388410. FELDMAN, J. A. and Y. YAKIMOVSKY. "Decision theory and artificial intelligence:I.A semanticsbased region analyzer." ArtificialIntelligence5,4, 1974, 349371. HANSON, A.R.and E.M. RISEMAN. "Segmentation ofnatural scenes." In CVS, 1978. HARLOW, C. A.and S. A. EISENBEIS." T h e analysisofradiographic images." IEEE Trans. Computers 22, 1973,678688. HOROWITZ, S. L. and T. PAVLIDIS. "Picture segmentation by a directed splitandmerge procedure." Proc, 2nd IJCPR, August 1974,424433. NILSSON, N.J. PrinciplesofArtificialIntelligence.Palo Alto,CA:Tioga, 1980. NILSSON, N.J.ProblemSolvingMethods inArtificialIntelligence.New York:McGrawHill, 1971. OHLANDER, R., K. PRICE, and D. R. REDDY. "Picture segmentation using a recursive region splitting method." CGIP8, 3,December 1979. ROBERTSON, T. V., P. H. SWAIN, and K. S. Fu. "Multispectral image partitioning." TREE 7326 (LARS Information Note 071373), School of Electrical Engineering, Purdue Univ., August 1973. ROSENFELD, A.and A.C. KAK. DigitalPictureProcessing. New York:AcademicPress, 1976. ZUCKER, S.W. "Region growing:Childhood and adolescence." CGIP 5,3,September 1976,382399.

References

165

Texture

6.1 WHAT ISTEXTURE?

Thenotion oftextureadmitstonorigiddescription, butadictionary definition of texture as "something composed of closely interwoven elements" is fairly apt. The description of interwoven elements is intimately tied to the idea of texture resolution, whichonemightthinkofastheaverageamountofpixelsforeachdis cemable texture element. Ifthisnumber islarge,wecanattempt todescribe the individual elements in some detail. However, as this number nears unity it be comes increasingly difficult to characterize these elements individually and they mergeintolessdistinct spatialpatterns.Toseethisvariability,weexamine some textures. Figure6.1shows "cane," "paper," "coffee beans," "brickwall," "coins," and "wire braid" after Brodatz'swellknown book [Brodatz 1966].Fiveof these examplesarehighresolution textures:theyshowrepeatedprimitiveelementsthat exhibitsomekindofvariation."Coffee beans,""brickwall"and"coins"allhave obvious primitives (even if itisnot soobvious howto extract these from image data).Twomoreexamplesfurther illustratethatonesometimeshastobecreative indefining primitives.In"cane"theeasiestprimitivestodealwithseemtobethe physicalholesinthetexture,whereasin "wirebraid"itmightbebettertomodel the physicalrelations ofalooseweaveofmetallicwires.However, thepapertex turedoesnotfitnicelyintothismold.Thisisnottosaythattherearenotpossibili tiesforprimitiveelements.Oneisregionsoflightnessanddarknessformed bythe ridgesinthepaper.Asecondpossibilityistousethereflectance modelsdescribed inSection 3.5tocompute "pits"and "bumps." However, theelementsseem to be"just beyondourperceptualresolvingpower" [Laws1980],orinourterms,the elementsareverycloseinsizetoindividualpixels.

166

MHF Ln
i iKH^Prrri
Fig.6.1 Sixexamplesoftexture,(a)Cane,(b)Paper, (c)Coffee beans,(d) Brickwall,(e)Coins.(0 Wirebraid.

Theexposition oftexturetakesplaceunder four main headings: 1. 2. 3. 4.


Sec. 6.7 What is Texture

Textureprimitives Structuralmodels Statisticalmodels Texturegradients


167

Wehavealreadydescribed textureasbeingcomposedofelementsoftextureprimi tives.Themainpointofadditionaldiscussionontextureprimitivesistorefine the ideaofaprimitiveanditsrelationtoimageresolution. The main work that isunique totexture isthatwhich describes howprimi tives are related to the aim of recognizing or classifying the texture. Two broad classesoftechniqueshaveemergedandweshallstudyeachinturn.Thestructural modelregardstheprimitivesasformingarepeatingpatternanddescribessuchpat ternsintermsofrulesforgeneratingthem. Formally,theserulescanbetermeda grammar.Thismodelisbestfordescribingtextureswherethereismuchregularity in the placement ofprimitive elements and the texture isimaged athigh resolu tion.The "reptile" texture in Fig. 6.9 isanexample that can be handled by the structured approach. The statisticalmodel usually describes texture bystatistical rules governing the distribution and relation ofgray levels.This works well for manynaturaltextureswhichhavebarelydiscernibleprimitives.The"paper"tex tureissuchanexample.Asweshallsee,wecannotbetoorigidaboutthisdivision since statistical modelscan describe patternlike textures and vice versa, but in generalthedichotomyishelpful. The examples suggest that texture is almost always aproperty ofsurfaces. Indeed,astheexampleofFig.6.2shows,humanbeingstendtorelatetextureele ments of varying size to aplausible surface in three dimensions [Gibson 1950; Stevens 1979].Techniquesfordetermining surface orientation inthisfashion are termed texture gradient techniques. The gradient is given both in terms of the direction ofgreatest changeinsizeofprimitivesand intermsofthespatialplace mentofprimitives.Thenotionofagradientisveryuseful.Forexample,ifthetex tureisembedded onaflat surface, thegradientpointstowardavanishingpointin the image. The chapter concludes with algorithms for computing this gradient. The gradient may becomputed directly or indirectly via the computation of the vanishingpoint.

Fig. 6.2 Textureasasurfaceproperty.


168
Ch. 6 Texture

6.2 TEXTURE PRIMITIVES

Thenotionofaprimitiveiscentraltotexture.Tohighlightitsimportance,weshall usetheappelation texel(for texture element) [Kender 1978].Atexelis (loosely) a visual primitive with certain invariant properties which occurs repeatedly in different positions,deformations, and orientations inside agiven area.One basic invariantpropertyofsuchaunitmightbethatitspixelshaveaconstantgraylevel, butmoreelaboratepropertiesrelatedtoshapearepossible. (Adetailed discussion of planar shapes isdeferred until Chapter 8.) Figure 6.3 showsexamples oftwo kindsoftexels:(a)ellipsesofapproximatelyconstantgrayleveland (b)linearedge segments. Interestingly, thesearenearlythetwofeaturesselectedastextureprim itives by [Julesz, 1981],whohas performed extensive studies ofhuman texture perception. Fortexturesthatcanbedescribed in twodimensions, imagebased descrip tionsaresufficient. Texture primitivesmaybepixels,oraggregatesofpixelssuch ascurvesegmentsorregions.The "coffee beans" texturecanbedescribed byan imagebased model:repeated darkellipsesonalighterbackground.Thesemodels describe equally wellanimageoftextureoranimageofapicture oftexture.The methodsforcreatingtheseaggregateswerediscussed inChapters4and5.Aswith allimagebased models,threedimensional phenomenasuchasocclusionmustbe handledindirectly.Incontrast,structuralapproachestotexturesometimesrequire knowledge ofthe threedimensional world producing the texture image.Oneex ampleofthisisBrodatz's"coins"showninFig.6.1.Athreedimensional modelof thewaycoinscanbestackedisneededtounderstandthistexture fully. Animportantpartofthetexeldefinition isthatprimitivesmustoccurrepeat edlyinsideagivenarea.Thequestion is:Howmany times?Thiscanbeanswered qualitatively byimaginingawindowthatcorrespondsapproximatelytoourfieldof viewsuperimposed onaverylargetexturedarea.Asthiswindowismadesmaller, correspondingtomovingtheviewpointclosertothetexture,fewerandfewertex els are contained in it. At some distance, the image in the window no longer

(a)

(b)

Fig. 6.3 Examplesoftexels.(a)Ellipses,(b) Linearsegmenis. 'C. 6.2 TexturePrimitives 169

appearstextured,orifitdoes,translationofthewindowchangestheperceivedtex turedrastically.Atthispointwenolongerhaveatexture. Asimilareffect occursif thewindowismadeincreasingly larger,corresponding tomovingthefieldofview farther away from theimage.At some distance textural details areblurred into continuous tones andrepeated elements arenolonger visible asthewindowis translated. (Thisis thebasisforhalftone images, which are highly textured pat ternsmeanttobeviewedfrom enoughdistancetoblurthetexture.) Thustheidea ofanappropriate resolution, orthe number oftexelsinasubimage, isanimplicit partofourqualitativedefinition oftexture.Iftheresolutionisappropriate,thetex ture will beapparent andwill "look thesame" asthefieldofviewistranslated acrossthetextured area. Most often the appropriate resolution isnotknownbut must becomputed. Often thiscomputation is simpler tocarry outthan detailed computationscharacterizingtheprimitivesandhencehasbeenusedasaprecursor tothelatter computations. Figure6.4showssucha resolutionlike computation, whichexaminestheimageforrepeatingpeaks[Connors1979]. Texturescanbehierarchical,thehierarchiescorrespondingtodifferent reso lutions.The "brick wall" textureshowssuchahierarchy. Atoneresolution,the highly structured pattern made bycollections ofbricksisinevidence;athigher resolution,thevariationsofthetextureofeachbrickarevisible.

6.3 STRUCTURALMODELS OFTEXELPLACEMENT

Highlypatterned texturestesselatetheplaneinanorderedway,andthuswemust understand thedifferent waysinwhichthiscanbedone.Inaregulartesselationthe


: H J !

issdiaiaiajtiii!

ifillfipTiIiTI ti f \

tatamauijiij

IfljtflBlllfeSliltUi

IBlVTaTitVfiTaTsM
(a) (b)

Fig. 6.4 Computing texture resolutions, (a)French canvas,(b) Resolution grid forcanvas, (c) Raffia. (d) Grid for raffia.
Ch. 6 Texture

170

polygons surrounding a vertex all have the same number of sides. Semiregular tesselationshavetwokindsofpolygons(differing innumberofsides) surrounding avertex.Figure2.11depictstheregular tesselationsoftheplane.Thereareeight semiregulartesselations oftheplane,asshowninFig.6.5. Thesetesselationsare convenientlydescribedbylistinginorderthenumberofsidesofthepolygonssur

(4, 8, 8)

(3, 6, 3, 6)

(3, 3, 3, 4, 4)

(3, 3, 4, 3, 4)

Fig. 6.5 Semiregulartesselations.


Sec.6.3 StructuralModelsof TexelPlacement 171

rounding each vertex. Thus ahexagonal tesselation isdescribed by (6,6,6) and everyvertexinthetesselationofFig.6.5canbedenotedbythelist (3,12,12).Itis important to note that the tesselations of interest are those which describe the placementofprimitivesratherthantheprimitivesthemselves.Whentheprimitives defineatesselation, the tesselation describingtheprimitiveplacementwillbethe dualofthisgraphinthesenseofSection5.4.Figure6.6showstheserelationships.

Fig. 6.6 Theprimitiveplacement tesselationasthedualoftheprimitive tesselation. 6.3.1 GrammaticalModels

Apowerful wayofdescribingtherulesthatgoverntexturalstructureisthrougha grammar.Agrammardescribeshowtogeneratepatternsbyapplyingrewritingrules toasmallnumber ofsymbols.Through asmallnumberofrulesandsymbols,the grammarcangeneratecomplextextural patterns.Ofcourse, thesymbolsturnout toberelated totexels.Themapping between the stored model prototype texture and an image oftexture with realworld variations may be incorporated into the grammar byattaching probabilities to different rules.Grammars with such rules aretermedstochastic[Fu1974]. There is no unique grammar for agiven texture; in fact, there are usually infinitely many choices for rules and symbols. Thus texture grammars are described as syntactically ambiguous.Figure 6.7 showsasyntactically ambiguous textureandtwoofthepossiblechoicesforprimitives.Thistextureisalsosemanti callyambiguous [Zucker 1976]inthatalternateridgesmaybethought ofin three dimensionsascomingoutoforgoingintothepage. There aremanyvariants ofthe basicideaofformal grammarsandweshall examinethreeofthem:shapegrammars,treegrammars,andarraygrammars.For a basic reference, see [Hopcroft and Ullman 1979]. Shape grammars are dis tinguished from the other two by having highlevel primitives that closely correspond totheshapesinthetexture.Intheexamplesoftreegrammarsandar ray grammars that we examine, texels are defined as pixels and this makes the
172
Ch. 6 Texture

Twochoicesfor primitives:

Fig. 6.7 Ambiguoustexture.

grammars correspondingly more complicated. A particular texture that can be describedineightrulesinashapegrammarrequires85rulesinatreegrammar [Lu andFu 1978].Thecompensatingtradeoff isthatpixelsaregratiswiththeimage; considerableprocessingmustbedonetoderivethemorecomplexprimitivesused bytheshapegrammar.
6.3.2 ShapeGrammars

Ashapegrammar [StinyandGips1972]isdefinedasafourtuple <Vti Vm,R, S> where: 1. Vtisafinitesetofshapes 2. Vmisafinitesetofshapessuchthat V, f) Vm= <j> 3. R isafinitesetofordered pairs (u, v) suchthatuisashapeconsistingofele mentsofV,+and visashapeconsistingofanelementofV*combinedwithan elementofV*m 4. SisashapeconsistingofanelementofV*combinedwithanelementofV*m. ElementsofthesetV,arecalledterminalshapeelements(orterminals).Elements oftheset Vmarecallednonterminalshapeelements (ormarkers).ThesetsV,and Vm mustbedisjoint. ElementsofthesetV,+areformed bythefinite arrangement ofoneormoreelementsof V,inwhichanyelementsand/or their mirror images maybeusedamultiplenumberoftimesinanylocation,orientation,orscale.The set Vf = V,+ U {A},where A is the empty shape. The sets V and V*m are defined similarly.Elements (u, v) ofRarecalled shaperulesandarewritten uv. inscalledtheleftsideoftherule;vtherightsideoftherule,uand vusuallyareen closedinidenticaldashedrectanglestoshowthecorrespondencebetweenthetwo shapes.Siscalled the initialshapeandnormally contains ausuch thatthere isa (u, v)whichisanelementof R.
Sec. 6.3 Structural Models of Texel Placement 1 7 3

A texture isgenerated from ashape grammar bybeginning with the initial shapeandrepeatedlyapplyingtheshaperules. Theresultofapplyingashaperule Rtoagivenshapesisanothershape,consistingof5withtherightsideofRsubsti tuted in S for an occurrence of the left side of R. Rule application to a shape proceedsasfollows: 1. Find partoftheshapethatisgeometrically similar totheleft sideofarulein termsofboth terminalelementsandnonterminal elements (markers).There must beaonetoone correspondence between the terminals and markersin theleft sideoftheruleandtheterminalsandmarkersinthepartoftheshape towhichtheruleistobeapplied. 2. Find the geometric transformations (scale, translation, rotation, mirror im age) whichmaketheleft sideoftheruleidenticaltothecorrespondingpartin theshape. 3. Applythosetransformations totherightsideoftherule. 4. Substitute thetransformed rightsideoftherulefor thepartoftheshapethat correspondstotheleftsideoftherule. Thegenerationprocessisterminatedwhennoruleinthegrammarcanbeapplied. Asasimpleexample,oneofthemanywaysofspecifying ahexagonaltexture {F VmiRyS) is (6.1)

v . I )

Hexagonal texturescanbegeneratedbytherepeatedapplication ofthesinglerule inR.Theycanberecognizedbytheapplicationoftheruleintheoppositedirection to agiven texture until the initial shape, /, isproduced. Ofcourse, the rulewill generateonlyhexagonaltextures.Similarly,thehexagonaltextureinFig.6.8awill berecognizedbutthevariantsinFig.6.8bwillnot.

(b)

Fig. 6.8 Texturestoberecognized (seetext).


174
Ch. 6 Texture

Amoredifficult exampleisgivenbythe"reptile"texture. Exceptfortheoc casional newrows,a(3,6,3,6) tesselation ofprimitiveswouldmodelthistexture exactly.Asshown inFig.6.9,thenewrowisintroduced whenasevensidedpol ygonsplitsintoasixsidedpolygonandafivesided polygon.Tocapturethiswitha shapegrammar, weexamine the dualofthisgraph, which isthe primitiveplace ment graph, Fig.6.9b.Thisgraph providesasimpleexplanation ofhowtheextra row iscreated; that is, the diamond pattern splits into two. Notice that the dual graphiscomposedsolelyoffoursided polygonsbutthatsomeverticesare (4,4,4) andsomeare (4,4,4,4,4,4). AshapegrammarforthedualisshowninFig.6.10. The imagetexturecanbeobtained byforming thedualofthisgraph.One further refinement shouldbeaddedtorules (6)and(7);sothatrule (7)isusedlessoften, theappropriateprobabilitiesshouldbeassociatedwitheachrule.Thiswouldmake thegrammarstochastic.

Fig. 6.9 (a)Thereptiletexture, (b)Thereptiletextureasa(3,6,3,6) semireg ulartesselationwithlocaldeformations. 6.3.3 Tree Grammars

Thesymbolicformofatreegrammarisverysimilartothatofashapegrammar.A grammar G,= (Vr, Vm>r,R,S) isatreegrammarif V,isasetofterminalsymbols Vmisasetofsymbolssuchthat

ym n v,=0
r:V,>N(where./Visthesetofnonnegativeintegers) istherankassociatedwithsymbolsin V, Sisthestartsymbol Risthesetofrulesoftheform X0*X or X0x X0...Xr(x) withxin V,andX0... Xr(x) inVm Foratreegrammartogeneratearraysofpixels,itisnecessarytochoosesomeway ofembeddingthetreeinthearray.Figure6.11showstwosuchembeddings.
Sec.6.3 Structural Models oi Texel Placement

175

= >

ee ee

e e

= o >
= >

= >

= >

= >

o
Fig. 6.10 Shapegrammarforthereptiletexture.

Intheapplication totexture [Luand Fu 1978],thenotion ofpyramidsor hierarchicallevelsofresolutionintextureisused.Oneleveldescribestheplace ment of repeating patterns in texture windowsa rectangular texel placement tesselationand another leveldescribestexelsintermsofpixels.Weshallillus
176
Ch. 6 Texture

starting point

Start!ng point

F"

(a) S t r u c t u r e A

(b) Structure B

Fig. 6.11 Twowaysofembeddingatreestructureinanarray.

tratetheseideaswithLuandFu'sgrammarfor"wirebraid."Thetexturewindows are shown in Fig. 6.12a. Each of these can be described by a "sentence" in a secondtreegrammar.Thegrammarisgivenby: Gw= (V,, Vmir,R,S) where

r * U i . C i ]
Vm = [X, Y, Z) r = {0,1,2} R:X X
/

(6.2) or/I, Y or d
or Ai

j Y

Y+ Cx Z
Z * Ai

andthefirst embedding inFig.6.11isused.Thepatterninsideeachofthesewin dowsisspecifiedbyanothergrammaticallevel: G= (Vti


Sec. 6.3 Structural Models of Texel Placement

Vmr,R,S)
[77

where V, ={1, 0} Vm = {Ai, A2,A3, A4,A5, A( AT,C\, C2, C3, C4, CS,Ce,C7, N0, Nh N2, Ns, N4} r = {0,1,2} S = {Ah d) R:
H
N

o *

1
N

C N

i ,

*.* / I \
"0
A

s * / i \
<
C

N
4

1
M

.*
o
0

V/ 1 \
N

o \

<

1 .

0 N, A5 0 \

'** / 1\
0

N3 C5 N3

V
**

1
N

V / | \
N

1
N

2
C

/ | \
N, C7 N, 1

" j 0

/ i \
*(, A?
N<|

1 / \
N^ H4

^ / l \ ' / \
N

7 %

The application ofthese rulesgenerates the twodifferent patterns ofpixels showninFig.6.13.


6.3.4 ArrayGrammars

Liketreegrammars,arraygrammarsusehierarchicallevelsofresolution [Milgram and Rosenfeld 1971;Rosenfeld 1971]. Array grammars are different from tree grammars inthat they do not use the treearray embedding. Instead, prodigious useofablankornullsymbolisusedtomakesuretherulesareappliedinappropri atecontexts.Asimplearraygrammarforgeneratingacheckerboardpatternis G = [V Vn,R)
178
Ch. 6 Texture

Fig. 6.12 Texturewindowandgrammar (seetext).

where V, ={0,1}(correspondingtoblackandwhitepixels,respectively) V =[b, S} bisa "blank" symbol used toprovide context for the application oftherules. Another notational convenience istouseasubscript todenotetheorientationof symbols.Forexample,whendescribingtherulesRweuse 0xb 0.vl tosummarizethefourrules 2 ~ * 1 ' 0 ^ 0 ' 0^'01, 6 0 1 0 wherex isoneof[U,D,L, R)

Thusthecheckerboardrulesetisgivenby R: S 0or1 0xb^ 0X1 \xb 1,0 Acompactencodingoftexturalpatterns [Jayaramamurthy 1979]useslevelsofar raygrammarsdennedonapyramid.Theterminalsymbolsofonelayerarethestart symbols ofthenext grammatical layer defined lower down inthepyramid. This corresponds nicely totheideaofhavingonegrammar togenerate primitivesand anothertogeneratetheprimitiveplacementtesselations. Asanotherexample,considertheherringbonepatterninFig.6.14a,whichis composed of4x3 arraysofaparticular placement pattern asshowninFig.6.14b. Thefollowinggrammarissufficient togeneratetheplacementpattern. GW={V,, Vm,R,S)
Sec.6.3 StructuralModelsofTexelPlacement 179

x in{Uf.D,L, R)

jJ**W"C

^ "^ "^ v
iw * " * v . * .

< X 5
TIL.'''

v v~V .?
J

jmm,

, . .,

Fig. 6.13 Texturegeneratedbytree grammar.

where
y (ft5}

R:S>a a x 6 a x a * x in {/,D,L, R} Wehavenotbeenpreciseinspecifying howtheterminalsymbolisprojected onto the lower level. Assume without loss of generality that it isplaced in the upper lefthand corner,therestofthesubarraybeinginitiallyblanksymbols.Thusasim plegrammarfortheprimitiveis G,= [Vlt Vn>R,S)

S'
# #'
:

#'

INITIAL ARRAY AT LEVEL 1

O'

TERMINAL ARRAY AT LEVEL 1

FINAL ARRAY

Fig. 6.14 Stepsingeneratinga herringbonetexturewithanarray grammar.


Ch. 6 Texture

1 8 0

where V, {0,1} Vn {a,b) a b b b 0 0 1 0 R:b b b b 0 1 0 1 b b b b 1 0 0 0


6.4 TEXTUREASAPATTERNRECOGNITION PROBLEM

Many textures donot have the nicegeometrical regularity of "reptile" or "wire braid"; instead, they exhibit variations that are not satisfactorily described by shapes,butarebestdescribedbystatisticalmodels.Statisticalpatternrecognitionisa paradigmthatcanclassifystatisticalvariationsinpatterns.(Thereareotherstatisti calmethodsofdescribingtexture [Prattetal. 1981],butwewillfocusonstatistical patternrecognitionsinceitisthemostwidelyusedforcomputer visionpurposes.) There isavoluminous literature on pattern recognition, including severalexcel lenttexts (e.g., [Fu1968;TouandGonzalez 1974;Fukunaga 1972],andtheideas have much wider application than their use here, but they seem particularlyap propriateforlowresolution textures,suchasthoseseeninaerialimages [Weszka et al. 1976]. The pattern recognition approach to the problem is to classify in stancesofatexture inanimageintoasetofclasses.Forexample,given thetex tures in Fig. 6.15, the choice might be between the classes "orchard," "field," "residential," "water." Thebasicnotionofpatternrecognition isthefeaturevector.Thefeaturevec tor v isa set of measurements {vi vm) which is supposed to condense the description of relevant properties of the textured image into asmall, Euclidean featurespaceofmdimensions.Each point in feature spacerepresentsavaluefor thefeature vectorappliedtoadifferent image(orsubimage) oftexture.Themeas urement valuesforafeature should becorrelated withitsclassmembership.Fig ure6.16 showsatwodimensional spaceinwhichthefeatures exhibit thedesired correlation property. Feature vector values cluster according to the texture from which they werederived. Figure 6.16 shows abad choice of features (measure ments) whichdoesnotseparatethedifferent classes. Thepatternrecognitionparadigmdividestheproblemintotwophases:train ingandtest.Usually,duringatrainingphase,feature vectorsfrom knownsamples areused topartition feature space intoregions representing the different classes. However, self teaching can be done; the classifier derives its own partitions. Featureselectioncanbebasedonparametricornonparametricmodelsofthedis tributions of points in feature space. In the former case, analytic solutions are sometimes available.In thelatter, feature vectors are clusteredintogroupswhich aretakentoindicatepartitions.Duringatestphasethefeaturespacepartitionsare used to classify feature vectors from unknown samples. Figure 6.17 shows this process. Giventhatthedataarereasonablywellbehaved,therearemanymethodsfor clustering feature vectors [Fukunaga 1972; Tou and Gonzales 1974; Fu 1974].
Sec. 6.4 Texture asaPattern Recognition Problem 1 8 1

I^fl^^
'/ ' /'

yf:>':/)",

J it .j

HiH H H H H

Fig. 6.15 Aerialimagetexturesfor discrimination.


182
Ch. 6 Texture

Fig. 6.15

(cont.)

One popular way of doing this is to use prototype points for each class and a nearestneighborrule[Cover1968]: assign vtoclassw,if /minimizes

mindiv, \w)
i '

wherev^.istheprototypepointforclassH>7. Parametrictechniquesassumeinformation aboutthefeaturevectorprobabil itydistributionstofindrulesthatmaximizethelikelihoodofcorrect classification: assignvtoclasswtifimaximizes max/?(w/|v)

+ + + +

a a
D O D O

+ "
+
o o

o o o o

+
o

o
(a)

(b)

Fig. 6.16 Feature space for texture discrimination, (a) effective features (b) ineffective features.

Sec.6.4

Texture asa Pattern Recognition

Problem

1 8 3

o
o o

c ++ + ++

(a)

(b)

Classifiedasw, Fig. 6.17 Pattern recognition paradigm.

Thedistributionsmayalsobeusedtoformulaterulesthatminimizeerrors. Pickinggoodfeatures istheessenceofpatternrecognition. Noelaboratefor malism willwork well for bad features such asthose ofFig.6.15b.On the other hand,almostanymethodwillworkforverygoodfeatures.Forthisreason,texture isagooddomainforpattern recognition:itisfairlyeasytodefinefeatures that (1) clusterinfeature spaceaccordingtodifferent classes,and (2)canseparatetexture classes. Theensuingsubsectionsdescribefeaturesthathaveworkedwell.Thesesub sections are in reverse order from those of Section 6.2 in that we begin with featuresdefined onpixelsFourier subspaces,grayleveldependenciesand con cludewithfeatures defined on higherlevel texelssuch asregions. However, the lessonisthesameaswiththegrammaticalapproach:hardworkspentinobtaining highlevelprimitivescanbothimproveandsimplify thetexturemodel.Spacedoes notpermit adiscussion ofmanytexturefeatures; instead, welimitourselvestoa fewrepresentativesamples.Forfurther reading,see[Haralick1978]. 6.4.1 TextureEnergy FourierDomainBasis Ifatexture isatallspatially periodicordirectional, itspowerspectrumwill tendtohavepeaksforcorrespondingspatialfrequencies.Thesepeakscanformthe basisoffeatures ofapatternrecognitiondiscriminator.Onewaytodefine features istosearchFourierspacedirectly [BajcsyandLieberman 1976]. Anotheristopar titionFourierspaceintobins.Twokindsofbins,radialandangular,arecommonly used,asshowninFig.6.18.Thesebins,togetherwiththeFourierpowerspectrum areused todefine features. IfF\s theFourier transform, theFourier powerspec trumisgivenby\F\2. Radialfeaturesaregivenby vrir2=Jf\F(.u,v)\2dudv
184

(6.5)
Ch. 6 Texture

(a)

(b)

Fig. 6.18 PartitioningtheFourierdomainintobins.

wherethelimitsofintegrationaredefinedby
r\ < w2 + v2 < rl u + 2

0< u,v< n\
where [r\t r2\ isoneoftheradialbinsandvisthevector (notrelatedtov)defined bydifferent valuesofn andr2.Radialfeatures arecorrelatedwithtexturecoarse ness. Asmooth texture willhave high values of Vr for small radii,whereasa coarse,grainytexturewilltendtohaverelativelyhighervaluesforlargerradii. Featuresthatmeasureangularorientationaregivenby
V

V 2

=Jf\F(u,

v)\2dudv

(6.6)

wherethelimitsofintegrationaredefinedby
01 < tan 1

<

0 < u, v < n 1 where [9U92)isoneofthesectorsandvisdefined bydifferent valuesof0jand92. Thesefeaturesexploitthesensitivityofthepowerspectrumtothedirectionalityof thetexture.Ifatexturehasasmanylinesoredgesinagivendirection9,\F\2will tend tohave high valuesclustered around the direction in frequency space 9 +
T T / 2 .

TextureEnergyintheSpatialDomain FromSection2.2.4weknowthattheFourierapproachcouldalsobecarried outintheimagedomain.Thisistheapproachtakenin[Laws1980].Theadvantage ofthisapproachisthatthebasisisnottheFourier basisbutavariantthatismore


Sec. 6.4 Texture asaPattern Recognition Problem 185

matchedtointuitionabouttexturefeatures.Figure6.19showsthemostimportant ofLaws'12basisfunctions. Theimageisfirsthistogramequalized (Section3.2).Then12newimagesare madebyconvolvingtheoriginalimagewitheachofthebasisfunctions (\.t.,f'k = / *hk for basisfunctions h\, ..., /z12).Then each ofthese imagesis transformed into an "energy" image bythe following transformation: Each pixel in thecon volvedimageisreplaced byanaverageoftheabsolutevaluesinalocalwindowof 15x15pixelscenteredoverthepixel:

/;(%*)

(!/*(*:/)I)

(6.7)

x',y' inwindow

Thetransformation f+ /*, k = 1,...12istermeda"texture energy transform" byLawsand isanalogous to the Fourier power spectrum. Thefk", k = 1,...12 form aset of features for each point in the image which are used in a nearest neighbor classifier. Classification detailsmay be found in [Laws 1980]. Our in terestisintheparticularchoiceofbasisfunctions used. Figure 6.20 shows acomposite of natural textures [Brodatz 1966] used in Laws'sexperiments.Eachtextureisdigitizedintoa128x 128pixelsubimage.The textureenergytransformswereappliedtothiscompositeimageandeachpixelwas classified intooneoftheeightcategories.Theaverage classification accuracywas about87%forinteriorregionsofthesubimages. Thisisaverygoodresultfortex turesthataresimilar.
6.4.2 SpatialGrayLevel Dependence

Spatial graylevel dependence (SGLD) matrices are one of the most popular sourcesoffeatures [Kruger etal.1974;Halletal.1971;Haralicketal.1973].The SGLDapproachcomputesanintermediate matrix ofmeasures from thedigitized image data, and then defines features as functions on this intermediate matrix. GivenanimagefwithasetofdiscretegraylevelsI,wedefinefor eachofasetof discretevaluesofdand9theintermediatematrixSid,9)asfollows: S(/,j\d, 9), an entry in the matrix, isthe number oftimes gray level /is orientedwithrespecttograyleveljsuchthatwhere fix) = / and /(y) = j then y= x + (dcos9, ds'm9)
1 4 6 4 1

2 8 12 8 2 0 0 0 0 0
2 8 1 4 12 6 8 4 2 1

1 4 6 4 1 4 16 2 4 16 4 6 2 4 36 2 4 6 4 16 2 4 16 4 1 4 6 4 1

1 0 2 0 1 2 0 4 0 2 0 0 0 0 0 2 0 4 0 2 1 0 2 0 1

1 4 6 4 1

0 2 0 1 0 8 0 4 0 12 0 6 0 8 0 4 0 2 0 1

Fig. 6.19 Laws'basisfunctions (these aretheloworderfouroftwelveactually used).


Ch. 6 Texture

186

Fig. 6.20 (a)Texturecomposite, (b) Classification.

NotethatwethegraylevelvaluesappearasindicesofthematrixS,implyingthat theyaretakenfromsomewellordereddiscreteset0,...,K.Since Sid, 9)=Sid, 9 + T T ) . commonpracticeistorestrict9tomultiplesofTT/4. Furthermore, informationis not usually retainedatboth9and9+IT.Thereasoningforthelatterstepisthat for most texture discrimination tasks, the information is redundant. Thuswe define Sid, 9)= >/2 [Sid,9)+Sid, 9+ TT)] TheintermediatematricesSyieldpotentialfeatures.Commonlyusedfeaturesare: 1. Energy Eid,9) =j ^ [SO,M9)]1
/=o j=0

(6.8)

2. Entropy
K K

Hid, 9)=^ Sii,j\d,9) logfii,j\d,0) 3. Correlation


K K

(6.9)

Z L Cid, 9)=^ ^ 4. Inertia

ii*x)ijHy)Sit,j\d,9) (6.10)
crxoy

1id,0 ) f iij)2Sii,j\d,9)
/ = 0 j=0

(6.11)

Sec. 6.4 Texture as aPattern Recognition Problem

187

5. LocalHomogeneity L U * ) r ^SU,j\d,9) (6.12)

whereS(/, y|</,0)isthe(/,j)thelementofid, 9),and Pxt, lt.SU.MB) /=oy=o (6.13a)

f * , f y scute*)
/=0 y=0

(6.i3b)
(6.13c)

i f (iVx)*tf(U\d,0) ,=o y=o and > 2=s <Jfiy)2tfb>M0)


7=0 10

(6.13d)

Oneimportantaspectofthisapproachisthatthefeatureschosendonothave psychologicalcorrelates [Tamuraetal.1978]. Forexample,noneofthemeasures described would takeonspecific valuescorresponding toour notionsof "rough" or "smooth." Also, the texture gradient isdifficult to define in terms ofSGLD featurevalues[BajcsyandLieberman1976]. 6.4.3 RegionTexels Regiontexelsareanimagebasedwayofdefining primitivesabovethelevelofpix els. Rather thandefiningfeatures directlyasfunctions ofpixels,aregionsegmen tation of the image iscreated first. Features can then bedefined in terms of the shape ofthe resultant regions, which are often more intuitive than the pixel related features. Naturally, the approach ofusing edgeelements isalsopossible. Weshalldiscussthisinthecontextoftexturegradients. Theideaofusingregionsastextureprimitiveswaspursuedin[Malesonetal. 1977].Inthatimplementation,allregionsareultimately modeledasellipsesanda corresponding fiveparameter shape description iscomputed foreach region. These parameters only define gross region shape, but the fiveparameter primi tivesseem towork wellfor manydomains. The textureimageissegmented into regionsintwosteps. Initially,themodifiedversionofAlgorithm5.1thatworksfor graylevelimagesisused.Figure6.21showsthisexampleofthesegmentationap plied to asample of "straw" texture.Next, parameters ofthe region grower are f w controlledsoastoencourageconvexregionswhichare it ithellipses.Figure6.22 showstheresultantellipsesforthe"straw" texture. Onesetofellipseparameters isx0, a,b,9where x0istheorigin, aand barethe major and minor axis lengths and9istheorientationofthemajoraxis(Appendix 1).Besidestheseshapeparam eters, elliptical texels are also described bytheir average gray level. Figure 6.23 givesaqualitativeindicationofhowrangesonfeature valuesreflect different tex els.
188
Ch. 6 Texture

(a) Image
(b . . >WithRegion Boundaries *ig. 6.21 Regionsegmentation forstrawtexture.

6.5 THETEXTUREGRADIENT

methodsaredepictedinFi 624 MUklTZ^ embeddedonaplanarsurface

** " " b e d M e ' T h e * h d S a S s u m e t h a t t h e textureis

euS1zeoitheseprimitivesconstrainstheorientationof

Fig. 6.22 Ellipsesforstrawtexture.


Sec. 6.5 The Texture Gradient

189

Bubbles Fiber Grass Leather Paper Raffia Sand Screen Straw Water 35

fH H
+H

M i l

I
h1

I I I I I I I ^ I 1
1 I

f H H

Hi

1 1H

1 1 HH H 90 Averagesize

Bubbles Fiber Grass Leather Paper Raffia Sand Screen Straw Water 0.1

HH}H I III llll I III I I I I

hmi
i 1 1 IB II

MHM
I HHH4H H H i IlH 1 1 1 W 1

0.7 Averageeccentricity Fig. 6.23 Features defined on ellipses.

the planeinthefollowing manner. The directionofmaximum rateofchange of projected primitivesizeisthedirectionofthe texturegradient. Theorientationof thisdirection with respecttotheimagecoordinate frame determineshow much theplaneisrotatedabout thecameralineofsight.Themagnitudeofthegradient canhelpdeterminehowmuch theplaneistilted withrespecttothecamera, but knowledgeaboutthecamerageometryisalsorequired.Wehaveseentheseideas beforeintheformofgradientspace;therotationandtiltcharacterizationisapolar coordinaterepresentationofgradients.

(a) Fig. 6.24

(b)

(c)

Methods for calculating surface orientation from texture.

1 9 0

Ch. 6

Texture

The second wayto measure surface orientation is byknowing theshapeof the texelitself.Forexample, atexturecomposed ofcirclesappearsasellipseson thetiltedsurface.Theorientationoftheprincipalaxesdefinesrotationwithrespect tothecamera,andtheratioofminortomajoraxesdefinestilt [Stevens1979]. Finally,ifthetextureiscomposedofaregulargridoftexels,wecancompute vanishing points.For aperspective image, vanishing points on aplanePare the projection ontotheimageplaneofthepointsatinfinity inagivendirection.Inthe exampleshere,thetexelsthemselvesare (conveniently) smalllinesegmentsona planethatareorientedintwoorthogonaldirectionsinthephysicalworld. Thegen eral method applies whenever the placement tesselation defines lines of texels. Two vanishing points that arise from texels on the same surface can be used to determine orientation asfollows. The linejoining the vanishing pointsprovides theorientation ofthesurface and theverticalpositionoftheplanewithrespectto thezaxis(i.e.,theintersectionofthelinejoiningthevanishingpointswithx= 0) determinesthetiltoftheplane. Linesegmenttexturesindicatevanishingpoints [Kender 1978].Asshownin Fig.6.25, thesesegments could arisequite naturally from an urban image of the windowsofabuildingwhichhasbeenprocessedwithanedgeoperator. Asdiscussed inChapter4,linesinimagescanbedetected bydetectingtheir parameterswithaHoughalgorithm.Forexample, byusingthelineparameteriza tion xcos6 +ysin9=r andbyknowingtheorientationofthelineintermsofitsgradientg= (Ax, Ay), a linesegment (x,y,Ax,Ay)canbemappedintor, 9spacebyusingtherelations
=

Axx + Ayy
A / A X 2 + Aj; 2

(6.14) (6.15)

= tan 1

Ax

These relationships can be derived by using Fig. 6.26 and some geometry. The Cartesiancoordinatesofther9spacevectoraregivenby a = g*x g (6.16)

Fig. 6.25 Orthogonallinesegments comprisingatexture.


Sec. 6.5 The Texture Gradient 1 9 1

Fig. 6.26 r9transform.

Using thistransformation, the set oflinesegments L\ shown in Fig.6.27 areall mapped intoasinglepoint inrB space. Furthermore, thesetoflinesLi which havethesamevanishing point (xViyv) project onto acircleinrB spacewiththe linesegment ((0,0), (xViyv)) asadiameter. Thisschemehastwodrawbacks:(1) vanishingpointsatinfinity areprojected intoinfinity, and (2)circlesrequiresome effort to detect. Hence weare motivated to use the transform (x,y, Ax,Ay) * ,9 forsomeconstantk. Nowvanishingpointsatinfinityareprojected intothe r originandthelocusofthesetofpointsL2 isnowaline.Thislineisperpendicular k tothe vector xv and j. units from the origin, asshown in Fig.6.28.Itcanbe

l|xj

detectedbyasecondstageoftheHoughtransform; eachpointaismappedintoan r'B'space. Foreverya,computeallther',B'suchthat acosB' +bsinB'= r' (6.17) and increment that location in the appropriate r',B'accumulator array. In this secondspaceavanishingpointisdetectedas
k

r = B'= tan"1

(6.18) (6.19)

xv

U, y)

(b)

Fig. 6.27 DetectingthevanishingpointwiththeHough transform.


192
Ch. 6 Texture

(*, Yy)

(b)

Fig. 6.28 Vanishing point loci.

In Render's application the texels and their placement tesselation are similar in that the primitivesareparallel toarcsinthe placement tesselation graph.Ina more general application the tesselation could becomputed byconnecting the centers of primitives. EXERCISES 6.1 Deviseacomputer algorithm that,givenasetoftexelsfrom eachofasetof different "windows" ofthe textured image,checks toseeofthe resolution isappropriate.In otherwords,trytoformalizethediscussionofresolutioninSection6.2. 6.2 Are anyof thegrammars inSection 6.3suitable for aparallel implementation (i.e., parallelapplication ofrules)? Discuss,illustratingyourarguments withexamplesor counterexamples from eachofthethreemaingrammaticaltypes (shape,tree,andar raygrammars). 6.3 Are shape, array, and tree grammars context free or contextsensitive as defined? Can suchgrammars be translated into "traditional" (string) grammars? Ifnot,how aretheydifferent; andifso, whyarethey useful? 6.4 Show how thegeneralized Hough transform (Section 4.3) could beapplied to texel detection. 6.5 Inanoutdoors scene, thereistheproblemofdifferent scales.Forexample,consider the grass. Grass that is close to an observer will appear "sharp" and composed of primitive elements, yet grass distant from an observer willbe much more "fuzzy" andhomogeneous.Describehowonemighthandlethisproblem. 6.6 Thetextureenergytransform (Section6.4.1) isequivalenttoasetofFourierdomain operations. How do the texture energy features compare with the ring and sector features? 6.7 The texturegradient ispresumably agradient insomeaspectoftexture.Whataspect isit,andhowmightitbequantified sothattexturedescriptionscanbemadegradient independent? 6.8 Writeatextureregiongrowerandapplyittonaturalscenes. REFERENCES
BAJCSY, R.and L. LIEBERMAN. "Texture gradient asadepth cue." CGIP 5, 1,March 1976,5267. BRODATZ, P. Textures:A PhotographicAlbumfor Artistsand Designers.Toronto: Dover Publishing Co., 1966.
References

193

CONNORS, R. "Towards aset ofstatistical features which measure visually perceivable qualitiesof tex tures."Proc, PRIP,August 1979,382390. COVER, T. M. "Estimation bythe nearest neighbor rule." IEEE Trans.Information Theory14, January 1968, 5055. Fu, K. S. Sequential Methods inPattern Recognitionand Machine Learning. New York: Academic Press, 1968. Fu, K.S.SyntacticMethods inPatternRecognition.NewYork:Academic Press, 1974. FUKUNAGA, K.IntroductiontoStatisticalPatternRecognition. New York, Academic Press, 1972. GIBSON, J.J. ThePerceptionofthe VisualWorld.Cambridge, MA:Riverside Press, 1950.
HALL, E. L, R. P. KRUGER, S. J. DWYER III, D. L. HALL, R. W. MCLAREN, and G. S. LODWICK. "A sur

vey of preprocessing and feature extraction techniques for radiographic images." IEEE Trans. Computers20,September 1971. HARALICK, R. M. "Statistical and structural approaches to texture." Proc, 4th IJCPR, November 1978,4560. HARALICK, R. M., R. SHANMUGAM, and I. DINSTEIN. "Textural features for imageclassification." IEEE Trans.SMC 3, November 1973,610621. HOPCROFT, J. E.and J. D. ULLMAN. IntroductiontoAutomata Theory,Languages andComputation. Read ing,MA:AddisonWesley, 1979. JAYARAMAMURTHY, S. N. "Multilevel array grammars for generating texture scenes." Proc, PRIP, August 1979,391398. JULESZ, B."Textons, the elements of texture perception, and their interactions." Nature 290, March 1981,9197. KENDER, J. R. "Shape from texture: a brief overview and a new aggregation transform." Proc, DARPA IU Workshop,November 1978,7984. KRUGER, R. P., W. B.THOMPSON, and A. F.TWINER. "Computer diagnosis of pneumoconiosis." IEEE Trans.SMC 45, 1974,4049. LAWS, K. I. "Textured image segmentation." Ph.D. dissertation, Dept. of Engineering, Univ. South ern California, 1980. Lu,S.Y.and K.S.Fu. "A syntacticapproach totexture analysis." CGIP 7,3,June 1978,303330. MALESON, J. T., C. M. BROWN, and J. A. FELDMAN. "Understanding natural texture." Proc, DARPA IUWorkshop, October 1977,1927. MILGRAM, D. L.and A. ROSENFELD. "Array automata and array grammars." Proc, IFIP Congress71, Booklet TA2. Amsterdam: NorthHolland, 1971,166173. PRATT, W. K., O. D. FAUGERAS, and A. GAGALOWICZ. "Applications of Stochastic Texture Field Models toImageProcessing." Proc. oftheIEEE. Vol.69,No. 5,May 1981 ROSENFELD, A."Isotonicgrammars, parallelgrammars and picturegrammars." In MI6, 1971. STEVENS, K.A. "Representing and analyzing surface orientation." In ArtificialIntelligence:An MIT Per spective,Vol.2,P.H.Winston and R. H.Brown (Eds.).Cambridge, MA:MITPress, 1979. STINY, G.and J.GIPS.AlgorithmicAesthetics: ComputerModelsfor CriticismandDesignintheArts. Berke ley,CA:University ofCalifornia Press, 1972. TAMURA, H., S. MORI, and T. YAMAWAKI. "Textural features corresponding to visual perception." IEEE Trans.SMC 8, 1978,460473. Tou, J.T.and R.C. GONZALEZ. PatternRecognitionPrinciples. Reading, MA:AddisonWesley, 1974. WESZKA, J.S., C. R. DYER, and A. ROSENFELD. " A comparative study of texture measures for terrain classification." IEEE Trans.SMC 6,4,April 1976,269285. ZUCKER, S.W."Toward amodel oftexture." CGIP 5, 2,June 1976, 190202.

194

Ch. 6 Textyre

Motion

7.1 M O T I O N UNDERSTANDING

Motion imagery presents many interesting challenges to computer vision, but staticsceneanalysisreceivedmoreattentioninthe1960'sand 1970's.Inpart,this may havebeen due toatechnical problem: With most types ofinput media and domains, motion vision input ismuch morevoluminous than staticvision input. However,webelievethatamorebasicproblemhasbeentheassumption thatmo tion vision could best be understood (or implemented) as many static frames analyzedveryquickly,withresultslinkedupintemporalsequence.Thischaracter izationofmotionvisionisextremebutperhapsilluminating.First,itassumesthat vision involves processing static scenes. Second, it acknowledges that massive amounts ofdatamayberequired. Third, initmotion understanding degenerates toapostprocessing stepwhichismostlyamatchingoperationthe differences or similaritiesbetween (understood) frames areanalyzedandrecorded.Theextreme "staticisbasic"viewisthatmotionisanunnaturallycomplexordifficult problem becauseitisillsuitedtothetechniquesavailable. Amodified viewisthatobjectmotionprovidesgoodimagecuesforsegmen tation, muchascolormight.Thisapproachleadstotheuseofmotionfor segmen tation,sothat motiongetsamorebasicroleintheunderstanding process.Inthis view,motion assuchisuseful for basicimageunderstanding; amotionimagese quence may actually be easier to understand than a static image, because the effects of motion can help in segmentation. Recent examples may be found in [Snyderl981]. A further departure from the "static is basic" view is that motion under standingisqualitativelydifferent from staticvision.Alogicalextremeofthisview isthattherearemany visualprocessingoperationswhoseprimitivesarepointsin motion,andthatinfactstaticvisionisthepuzzle,beingillsuitedtotheneedsand mechanismsofbiologicalsystems. Seriousworkincomputermotion understand
195

ing hasbegun even morerecently than computer vision asawhole,and itistoo early todismissanyapproachout ofhand.Therearedomains andapplicationsin whichthe "staticisbasic"paradigmseemsnatural,butitalsoseemsveryreason able that animals have perceptual systems or subsystems for which "motion is basic." Section7.2isconcernedwithprocessingandunderstandingthe"flow" ofthe world imageacrosstheretina.Section7.3considersseveral techniques for under standingsequencesofstaticimages.
7.1.1 DomainIndependent Understanding

Domain independent motion processing extracts information from timevarying images using the weakest possible assumptions about the world. Processing that merely transforms the input data into another imagelike structure isin thepro vinceofgeneralized image processing. However, ifthe motion processingaggre gatesspatialinformation onthebasisofacommonfeature,thentheprocessingisa formofsegmentation. Thebasicvisualinputfordomainindependent workinmotionvisionunder standing is opticalflow.Although Helmholtz noted the striking immediacy of threedimensional perception mediated through motion [Helmholtz 1925],Gib sonisusuallycreditedwithpioneeringthetheorythataprimaryvisualstimulusfor motionistheflowofelementsintheopticarray,orpatternofluminanceinthefull sphere ofsolidanglesurrounding theobserver [Gibson 1950, 1957,1965,1966]. Human beings undoubtedly are sensitive to optical flow, as evidenced by the "looming" reflex [Schiff 1965],the effect of flow on balance [Leeand Lishman 1975], and many other documented phenomena [Nakayama and Loomis 1974]. The basic input to an "optical flow understander" is a continuously changing visualfield,whichmaybeconsideredafieldofvectors,eachexpressingtheinstan taneouschangeofpositionontheopticarrayoftheimageofaworldpoint.A field ofsuchvectorsisshowninFig.7.1. Theextraction ofthevectorsfrom thechang ingimageisalowleveloperation often posited byopticalflowresearch;onecom putational mechanism wasgiven inChapter 3.Flowmayalso beapproximated in animagesequencebymatchinganddifference operations (Section7.3.1). Computer vision researchers have recently begun to concern themselves withboththegeometryandcomputational mechanismsthatmightbeuseful inthe understanding of optical flow [Horn and Schunck 1980; Clocksin 1980; Prager 1979;Prazdny 1979;Lawton 1981].Manyformalisms areinuse.Cartesian,polar space, and spherical coordinates all have their appeal in different situations; differential vectorgeometryandsimpleanalyticgeometry areboth used;even the geometryoftheeyeorcameravariesfrom onestudytoanother.Thischapterdoes notcontaina"unifiedflowtheory;"insteaditbrieflydescribesseveralapproaches, eachofwhichusesadifferent aspectofoptical flow.
7.1.2 DomainDependent Understanding

The use of models, orat least stronger assumptions about the world, iscomple mentarytodomainindependentprocessing.Thechangingimage,oreventhe field ofopticalflow,canbetreatedasinputtoamodeldrivenvisionprocesswhosegoal
196
Ch. 7 Motion

(a)

(b)

Fig. 7.1 An example ofan opticalflowfieldfor an approaching "hill." (a)The hill, (b) Flow field.

istypicallytosegment theinputintoareascorresponding tomeaningful worldob jects.Theoptical.flowfieldbecomesjustanothercomponentofthegeneralizedim age, together with intensity, texture, or color. Motion often reveals information similar tothat from rangedata;flow andrangearediscontinuousatobject boun daries, surface orientation may be derived, and so forth. Object (or world) mo tions determine image (or retinal) motions; we shall be explicit about which motionwemeanwhenconfusion canoccur. Section 7.3describes howknowledgeofobject motion phenomena canhelp insegmentingtheflowfield.Oneuseful assumptionisthattheworldcontainsrigid bodies. Tests for rigid bodies and calculations using data from them are quite usefulfor example, the threedimensional position offour pointsonarigidob ject maybedetermined uniquely from threeviews (Section 7.3.2). Aweakerob ject model, that they are assemblies of compound rigid pendula (linkages), is enough toaccomplish successful segmentationofverysparsemotioninputwhich consists only of images of the end points of links (Section 7.3.3). Section 7.3.4 describes workwithahighly specific and detailed model whichisused inseveral waystorestrictlowlevelimageprocessingandaidinthreedimensional interpreta tionofhuman motionimages.Section7.3.5considerstheprocessingofsequences ofsegmentedimages. The coherence of most threedimensional objects and their continuity through time are two general principles which, although occasionally violated, guidemanysegmentation andpointmatchingheuristics.Theassumed correspon dence of regions in imageswith objects isone example. Motion images provide another example; object coherence implies the likelihood ofmany "continuity" (actually similarity) conditions on the positions and velocities of neighboring imagepoints.
Sec. 7.1 Motion Understanding 197

Herearefiveheuristicsforuseinmatchingpointsfrom imagesseparatedbya smalltimeinterval [Prager1979](Fig.7.2). 1. Maximum velocity. Ifaworld point isknown tohaveamaximum velocity V with respect to astationary imaging device, then it can move at most Vdt between twoimagesmade dttimeunitsapart.Thusgiven the location of the pointinoneimage (andsomeassumptionsaboutdepth),thisconstraintlimits wherethepointcanappearonthesecondimage. 2. Smallvelocitychange. Sincemost visiblephysicalobjectshavefinitemass,this heuristicisaconseqenceofphysicallawsandtheassumptionofa"smallinter val"betweenimages.Ofcourse,thedefinitionof"smallinterval"dependson thedefinition ofthevelocitychangesonedesirestomeasure.

$
/

<y

^>
t, Maximum Velocity t 2

SmallVelocity Changes

/ /
/
Common Motion Consistent Match

\ \

/ > N^ \
Model

Fig. 7.2 Five heuristics. 198


Ch. 7 Motion

3. Common motion. Spatially coherent objects often appear in successive images asregionsofpointssharinga"common motion." Itisinteresting thatsucha weak notion ascommon motion (and the related "common position") actu allycanservetosegmentverysparsescenesofafewpointswithverycomplex motion behavior ifalongenough sequence ofimagesisused (Sections 7.3.3 and7.3.4). 4. Consistentmatch. Twopointsfrom oneimagegenerally donot match asingle point from another image (exceptions arise from occlusions). This isoneof themainheuristicsinthestereopsisalgorithmdescribedinChapter3. 5. Knownmotion. Ifaworldmodelcansupply information aboutobject motions, perhapsretinalmotionscanbederived,predicted,andrecognized. In the discussions tofollow these heuristics (and others) areoften used or implicitlytakenasprinciples.Acareful catalogoftheprobablebehaviorofobjects in motion is often a useful practical adjunct to amathematical treatment. The mathematics itself must be based on aset of assumptions, and often these are closelyrelatedtothephenomenologicalheuristicsnotedabove.

7.2 UNDERSTANDING OPTICAL FLOW

This section describes some more direct calculations on optical flow, using no otherinputinformation. Information maybeobtainedfromflowthatseems useful bothforsurvivalintheworldand (onalessexistentiallevel) forautomated image understanding. As with shape from shading research (Chapter 3), the paradigm hereisoften toseemathematicallywhatinformation residesintheinputandtouse thistosuggestmechanisms fordoingthecomputation.Theflowinputisassumed to be known (Chapter 3showed how to derive opticalflowby local analysis of changingintensityintheimage).
7.2.1 FocusofExpansion

Asonemovesthroughaworldofstaticobjects,thevisualworldasprojected onthe retina seems toflowpast. In fact, for agiven direction oftranslatory motion and f o directionofgaze,theworldseemstobe lowing utofoneparticularretinalpoint, thefocusofexpansion (FOE).Eachdirection ofmotionandgazeinducesaunique FOE,whichmaybeapointatinfinity ifthemotionisparalleltotheretinal (image) plane. Theseaspectsofopticalflowhavebeenstudied bycomputing thesimulated flowpattern an observer would see while moving through a"forest" ofvertical cylinders [Prager 1979]orGaussian hillsandvalleys [Lawton 1981].Somesample FOEsareshowninFig.7.3.Figure7.3cshowsasecondFOEwhenthefieldofview containsanobjectwhichisitselfinmotion. Our first model of the imaging situation is asimplification of the imaging geometry given in Appendix 1.Let the viewpoint be at the origin with the view

Sec. 7.2 Understanding Optical Flow

199

Fig. 7.3 FOE for rectilinear observer motion, (a) An image, (b) Later image, (c) Flow showsdifferent FOEsfor staticfloor and moving object.

directionoutalongthepositiveZaxis,andletthefocallength/ = 1.Thentheper spectivedistortionequationssimplify to


X

(7.1)
(7.2)

y_ y' = z

In the next twosections the letters u, v,and w(sometimes written as func tions of /) denote world point velocity components, or the time derivatives of worldcoordinates (x,y, z). Observermotionwithinstantaneousvelocity {dxldt, dy/dt,dz/dt) = (~u, v, w), keepingthecoordinatesystemattached tothe viewpoint,givespointsinastationaryworldarelativevelocity(,v,w).Considera point located at Gc0,yo,z0) at someinitialtime.After atimeinterval t,itsimage willbeat (*',/) = Xo+ut y0+ vt
ZQ+

Wt' ZQ+ Wt

(7.3)

200

Ch.7 Motion

Astvaries,thisparametric"flowpath" equationisthatofastraightline;asrgoes tominusinfinity, theimageofthepointtravelsbackalongthestraightlinetoward aparticularpointontheimage,namely,


(7.4) FOE= u_ v_ w'w Thisfocusofexpansioniswheretheopticalfloworiginatesontheimage.Iftheob serverchangesdirection (orobjectsintheworldchangetheirdirection),theFOE changesaswell.

7.2.2 Adjacency, Depth,andCollision Theflowpathequationofapointmovingwithaconstantvelocityrevealsinforma tion about itsdepth in z.The information isnot provided directly, sinceallflow pathsforpointsatagivendepthdonotlookalike.However,thereistheelegantre lation Pit) ^ z(t) n <> VU) w(t) Hereagain wisdz/dt,and FisdD/dt.Disthedistancealongthestraightflowpath from the FOEto the imageof the point. Thus the distance/velocity ratio of the point's image isthe same as the distance/velocity ratio of the world point. This resultisbasic,butperhapsnotimmediatelyobvious. The above relation is called the timetoadjacency relation, because the righthandside,z/w, isthezdistanceofthepointfromtheimageplanedividedby itsvelocitytowardtheplane.Itisthusthetimeuntilthepointpassesthrough the imageplane.Thisbasictimeintervalisclearlyuseful whendealingwithworldob jects; it changes when the magnitude of the world point's velocity (or the observer's)changes. Knowing the depth of any point determines the depth of all others of the same velocity w, for it follows from the two time to adjacency equations of thepointsthat z (t)D Z2(t)= (t)V (t)
l

Thetimetoadjacency equationallowseasydetermination oftheworldcoor dinatesofapoint,scaledbyitszvelocity.Iftheobserverismobileandincontrolof hisownvelocity,andiftheworldisstationary,suchscaledcoordinatesmaybeuse ful. Usingtheperspectivedistortionequations,

mm

(6 7 )

z(t)=wMm_
y{t)=yuu{t)DU)
XKt)
v { t )

(77)
( 7 8 )
U)

Understanding OpticalFlow

201

Asalastexample,letusrelateopticalflowtothesensingofimpendingcolli sionswithworldobjects.Thefocalpointoftheimagingsystem,ororiginofcoordi nates, is at any instant headed "toward the focus of expansion," whose image coordinatesare(u/w, v/w). Itisthustravelinginthedirection
0 = {JLt ^ , 1 ) w w (7.10)

andisfollowing atanyinstant apath intheenvironment instantaneously defined bytheparametricequation (x,y,z) =tO=t(, ,1) (7.11) w w whereractslikearealscalarmeasureoftime.Giventhisvectorexpressionfor the pathoftheobserver,onecanapplywellknownvectorformulasfromanalyticsolid geometry to derive useful information about the relation of this path to world points,whicharealsovectors. Forexample,thepositionPalongtheobserver'spathatwhichaworldpoint approachesclosestisgivenby

whereOisthedirection ofobservermotionandxthepositionoftheworldpoint. Here the period (.) isthe dotproduct operator.Thesquareddistance Q2between theobserverandtheworldpointatclosestapproachisthen Q2= (xx) (xO)V(OO)
7.2.3 SurfaceOrientation andEdge Detection

(7.13)

Itispossibletoderivesurface orientation andtocharacterizecertaintypesofsur facediscontinuities (edges) bytheirmotion. Aformalism, computerprogram,and biologically motivated computational mechanism forthese calculations was developedin[Clocksin1980]. Thissection outlines mainly the surface orientation aspect ofthiswork.As usual, the model isforamonocular observer, whose focal point isthe origin of coordinates.An unusual feature ofthemodel isthat theobserver hasaspherical retina. Theworldisthusprojected ontoan"imageunitsphere"insteadofanim age plane. World points and surface orientation are represented in an observer centered Cartesian coordinate system. The imagesphere hasaspherical coordi nate system which may beconsidered as"longitude" 9and "latitude"0. These coordinatesbearnorelationtotheorientation oftheretina. Worldpointsarethen determined bytheirimagecoordinatesandaranger. Anobservercentered Carte siancoordinatesystemisalsouseful;itisrelatedtothesphereasshowninFig.7.4, andbythetransformations giveninAppendix1. Theflowoftheimageofafreely movingworldpointmaybefound through thefollowingderivation.Asbefore,lettheworldvelocityofthepoint (possiblyin duced byobserver motion) (dx/dt, dy/dt, dz/dt) be written (w, v, w).Similarly,
202
Ch. 7 Molion

Fig. 7.4 Sphericalcoordinatesystem,andthedefinition ofa andT.

writetheangularvelocitiesoftheimagepointinthe9and<j>directionsas
8 e =

d9 dt
d<f>

(7.14) (7.15)

dt Thenfrom thecoordinatetransformation equationsofAppendix1, y =x tan9

(7.16)

Differentiating andsolvingford9/dt (writtenas8)gives v utan9 e (7.17) o = xsec29 Substituting forxitsspherical coordinateexpression r sin</>cos0and simplifying yieldsthegeneralexpressionforflowinthe9direction:
ft a,
V C 0 S

^ ~ U Sin

(n i o)

rsin</> Thederivationofeproceedsfromthecoordinatetransformation equation 2= r cos<f> (7.19) Differentiating, solvingford<f>/dt(writtenase),andusing


Understanding Optical Flow

203

dr _ xu + yv + zw dt r yieldsthegeneralexpressionforflowinthe<f>direction: _ (xu +yv + zw) cos</> nv


r sin<f>
2

,*, ~^x

C721)

As usual, general point motions are rather complicated to deal with, and moreconstraintsareneeded iftheopticflowistobe"inverted" todiscovermuch abouttheoutsideworld.Letusthen makethesimplification thattheworldissta tionaryandtheobserveristravelingalongthezdirectionatsomespeedS(Thisas sumptionisbrieflydiscussedbelow.)Explicitly,supposethat u 0, v 0, w S Substituting these into the general flow equations (7.18) and (7.21) yields simplifiedflowequations:
8= 0 e= ^ r (7.22) (7.23)

Thusrisafunction of9and$ andthereforesoise. Itisthissimplifiedflowequationwhichformsthebasisforsurface orientation calculationandedgedetection.Thegoalsaretoassigntoanypointinthe low ield f f oneofthree interpretations: edge, surface,orspaceandalso toderivethe typeof edgeandtheorientationofthesurface. Tofindsurface orientation, represent the surface normal of asurface I by twoanglescr andr defined asinFig.7.4withthe twoplanesofcrandTbeingthe RZ and QRplanes,respectively.Theslantismeasuredrelativetothelineofsight, denoted by R in the figure, cr and Tcorrespond to depth changes in "depth profiles"orientedalonglinesofconstant9and<,respectively.Thus, 1 dr tana = r d(t>
(7.24)

1 dr tanr ^ (7.25) r 89 r 69 Surfaceorientation isdefined bycr andr orequivalentlybytheirtangents.A surfaceperpendiculartothelineofsighthasa =T=0. Equations (7.24)and (7.25) assumetherangerisknown. However,onecan determine them without knowing r through the simplified flow equation, Eq. (7.23).Thelattermaybewritten
r

_ 5sin</> ~ e(9, 0)

wheree(9,0) givestheflowinthe<f> direction.Differentiating thiswithrespectto 9and< gives


204
Ch. 7 Motion

dr = ecos0 sin<t Qe/d0) f

St
dO e2

e2

(7.26)
(7.27)

dr_= $sint fa/d9)

TheselastthreeequationsmaybesubstitutedintoEqs.(7.24) and (7.25),andthe resultsmaythenbesimplified tothefollowingsurfaceorientationequations:

J_ tancr = cotd) lne 90


tarJLfcw)

(7.28) (7.29)

Thesetangents arethus easilycomputed from opticalflow.Theresult does notdependonvelocity,andnodepthscalingisrequired.Infact,absolutedepthis notcomputableunlessweknowmore,suchastheobserverspeed. Turning briefly to edge perception: Although physical edges are a depth phenomenon, inflowtheyaremirrored bye, theflowmeasure thatallowsdeter mination oforientation without depth.In particular, itispossibleto demonstrate thattheLaplacianofehassingularitieswheretheLaplacianofdepthhassingulari ties. Anarconthesphereprojects outontoa"depth profile" intheworld, along whichdepthmayvary. Ifthearcisparameterized bya, relationsamongthedepth profile,flowprofile, and the singularities inflowareshown in Fig.7.5.Thus the Laplacianofprovidesinformation aboutedgetypebutnotaboutedgedepth. Theformalderivationsareatanend.Implementingtheminacomputerpro gram or in a biological system requires solutions to several technical problems. More detailson the implementation of thismodel on acomputer and apossible
Sing.V 2 ^ Theoretical edge signature

Range profile

Flow profile

N
Fig. 7.5 Thesingularitiesof the second derivative oftheflow profile inform about thetypeofedge. 205

Sec. 7.2 Understanding Optical Flow

implementation usinglowlevelphysiologicalvisionprimitivesappearin[Clocksin 1980]. There are some data on human performance for the types of tasks at tempted bytheprogram.Theassumption ofafixedenvironment basicallyimplies that flow motions intheenvironment arelikelytobeinterpreted asobserver mo tions. This view is rather strikingly borne out by "swaying room" experiments [Leeand Lishman 1975],inwhich asubject stands inaswayablevisual environ ment. (Alarge, lowmass bottomless boxsuspended from above may belowered around thesubject, givinghimaroomlikevisualenvironment.) When thehang ing "room" is made to sway, the subject inside tends to lose balance. Further, moving surfaces in the real world arequite often objects ofinterest, such asan imals. Asurvey ofdepth perception experiments [Braunstein 1976] pointstomo tion as the dominant indicator of surface orientation perception. Randomdot displaysofmonocularflowpatterns [RogersandGraham 1979]evokestrikingper ceptionsofsolidoriented surfaces; flowmaybeadequateforshapeanddepthper ception even withnoother depth information. Theexperiments on perception of "edges," ordiscontinuities inflow caused bydiscontinuities indepth oftextured surfaces, are less common. However, there have been enough to provide some confirmation ofthemodel. The computational model is consistent with and has correctly predicted psychological data on human thresholds for slant and edge perception in optical flowfields.(Thethresholdsareontheamountofslanttothesurfaceandthedepth difference oftheedgesides.) Thecomputational model canbeused todetermine range,butonlytopooraccuracy;thishappenstocorrespond withthehuman trait thatorientation ismuch moreaccuratelydetermined byflowthanisrange.Quanti tatively, theaccuracyoforientationandrangedeterminationsarethesameforthe modelandforhumanbeingsundersimilarconditions.
7.2.4 Egomotion

Itispossibletoextract information aboutcomplex observer motionsfrom optical flow, although at considerable computational cost. In one formulation [Prazdny 1979],amodelobserverisallowedtofollowanyspacecurveinanenvironmentof stationary objects, whileatthesametimeturning itshead.Itispossibletoderive formulae thatdeterminetheobserver'sinstantaneous velocityvectorandheadro tational vectorfromasmallnumber (six)offlowvectorsintheimageona(stand ardflat)retina. The equations that describeflowgiven observer motion and head rotation can bequite compactly written by using vector operators and apolar coordinate system (similartothatofthelastsection).Theinherenteleganceandpowerofthe vector operations iswell displayed in these calculations. Inverting the equations resultsinasystemofthreecubicequationsof20termseach.Suchasystemcanbe solvedbynormalmethodsforsimultaneousnonlinearequations,butthesolutions tendtoberelativelysensitivetonoise.Inthenoisefree case,themethodseemsto performquiteadequately. Thecalculationyieldsamethodforderivingrelativedepth,ortheratioofthe
206
Ch. 7 Motion

distances of points from the observer. An approximation to surface orientation maybeobtained usingseveralrelativedepthmeasurementsinasmallareaandas sumingthatthesurfacenormalvariesslowlyintnearea.
7.3 UNDERSTANDING IMAGESEQUENCES

An imagesequence isan ordered setofimages.The imagesequences ofinterest herearesamplingsoffourdimensional spacetime.Commonly,asinamovie,the imagesaretwodimensional projections ofathreedimensional physicalworld,se quenced through time.Sometimes the sequence consists oftwodimensional im ages of essentially twodimensional slices of the threedimensional world, se quenced through the thirdspatial dimension. Someofthe techniques inthissec tion are useful in interpreting the threedimensional nature ofobjects from such spatial image sequences, but the main concern here is with temporal image se quences. In many practical applications, the input must besuch asequence, and continuous motion must be inferred from discrete location differences of image points.Thethrust ofworkundertheseassumptionsisoften toextendstaticimage understandingbymakingmodelsthatincorporateorexplainobjectsinmotion,ex tendingsegmentationtoworkacrosstime[Thompson 1979,Tsotsos1980]. Whenaskedwhyhewaslisteningtoametronometicking,EzraPoundissaid to have replied that he did not listen to the ticks, but to the "spaces between them." Like Pound, we take the ticks, or images, as given, and are really in terested inwhatgoeson "between the ticks."Weusually want todetermine and describe how the images are related to each other. This information must be derived from the static images, and two approaches immediately present them selves: broadly, the first is to look for differences between the images, and the secondistolookforsimilarities. These two approaches are complementary, and are often used together. A generalparadigmforobjectoriented motionanalysisisthefollowing: 1. Segment (describe) the individual images. This process may be complex, yieldingarelationalstructureorasegmentation intoregionsoredges.Anim portantspecialcaseistheoneinwhichthedescription (segmentation) process isnullandthedescription isjusttheimageitself.Forexample,aninitialhigh level static description isimpossible ifmotion istobe used asanaid toseg mentation. 2. Computeanddescribe thedifferences orsimilaritiesbetweenthedescriptions (orundescribedimages). 3. Buildadescriptionofthesequenceasawholefrom thesingleframe primitives and descriptions ofdifference orsimilarity that arerelevant tothepurposeat hand.
7.3.1 Calculating Flowfrom Discrete Images

Thismethodisaformofdisparitycalculationthatisnotonlyusedforflowcalcula tions,butmayalsobeusedforstereomatchingortrackingapplications.Thecom
Sec. 7.3 Understanding Image Sequences

207

putationsareimplementedwith"relaxation"techniques. Theflowcalculations haveso far assumed an underlying continuous image which wasdensely sampled. With those assumptions and afew more the funda mental motion equation allows thecalculation offlow (Chapter 3).Theapproach ofthissectionistoidentify discretepointsintheimagethatareverydifferent from their surround. Given such discrete points from each of two images at different times, the problem becomesoneofmatchingapoint inone imagewiththeright point (if it exists) in the other image. This matching problem is known as the correspondenceproblem [DudaandHart 1973,Aggarwaletal 1981]. Thesolution tothecorrespondenceprobleminthecaseofmotionis,ofcourse,theopticflow. Onealgorithm for matching distinct points from twodifferent frames [Bar nardandThompson 1979]breaksthematchingproblemintotwosteps.Thefirstis theidentification ofcandidatematchpointsineachofthetwoframes.Thesecond isaniterativealgorithmwhichadjustsmatchprobabilitiesforpairsofmatchpoints. After successful termination ofthealgorithm,correctmatcheshavehighprobabil itiesandincorrectmatcheshaveverylowprobabilities. TheMoravecinterest operator ([Moravec 1977];Section 3.2) producescan didate match points by measuring the distinctness ofa local piece of the image from itssurround. Eachframe isanalyzedseparately sothat theendresult istwo setsofpointsS\andS2,onefrom eachframe,whicharecandidatestobematched. CandidatesinS\areindexedby/andthoseinS2 b y / The iterativepart ofthealgorithm isinitialized withadatastructurefor the possiblematchesthatexploitstheheuristicthatapointintheworlddoesnotmove large distances betweenframes. Potential matches for agiven point x, inS\, the firstimage,areallpointsyyinS2suchthat l * / J V l < vmax (7.30) where vmax isthe maximum disparityallowed between points.Allpoints thatare selected bytheMoravecoperatorhaveagivendisparityvectorv yandarekeptas possible matches. Each disparity has an associated probability Pt> which changes through time as the most likely disparities are found. The information kept for eachpointx,inS\ lookslike

(X/(v;>v j y f r v * *W (*",/>*) )

w o

where K*isaspecialsymbolthatdenotes"nomatch," andallthey*aremembers of S2. Storing the flow vectors vimplicitly stores the corresponding point in S2 sincey, = x, + v,y.Sincetheprobabilitiesareadjusted iteratively,onefinalindex isneeded todenotetheiterationvaluesothatPuactuallybecomesPjforn ^ 0. The initial approximation for the probabilities Pjtakes advantage of the "common motion"heuristic:Ifyyisthecorrectmatchpointforx,,theimagenear yyshouldlookliketheimagenearx,.ThusPjj canbedefinedby Pg= rr
J

for

x in 5,

(7.32)

1 + CWjj

where
W

U= Z
\dx\4 k

l(/Ui +dx, fi) f(j. +dx,t2)]2

(7.33)
Ch. 7 Motion

208

and cis constant. The updating formula is complex in form but basically isa weighted sumofneighboring match probabilitieswherethe neighboring match is consistent (i.e.,hasnearlythesamevelocity).Aneighboringmatchkis consistent if < dVmn Thegoodnessofaparticularmatchismeasuredbyqu, where
pnl kaneighborof/ /s.t.klsatisfies(7.34)

(7.34) (7.35)

andtheprobabilitiesareupdatedby

PSpgrl(A +Bqu)
pp. =

(7.36) (7.37)

P"

Y.H j s.t. ijisamatch


wherethefunction ofEq.(7.36)istorenormalizetheprobabilitiesandAandBare constants. Thefollowingsimplified examplemakestheseideasmoreconcrete. Consider thesituation giveninFig.7.6,wherethepointsin (a) arefrom S\ and the points in (b) are from S2. Using hypothetical values for P, an initial matchdatastructureis,intermsofEq.(7.31): ((4, 10) ((5,0), 0.7) ((4,5), 0.25) ( ( 2 , 8 ) , 0.05)) ((4, 6) ((5,4), 0.5) ((4, 1 ) , 0.3) ((2, 4 ) , 0.2)) ((2, 3) ((7,7), 0.3) ((6,2), 0.35) ((4, 1 , 0.2))

/=1

10 8

/.1

/=2

6 4

/=2 /=3
2
I

/ 3
l l i i

6 (a)

6
(b)

10

Fig. 7.6 Discrete matching:aconcrete example.


Sec.7.3 Understanding Image Sequences

209

Also,Z)vmax = 1,usingthechessboard norm.Usingtheupdatingformula (7.35), thefirstsetof4,/sisgivenby [0.3 0.2 0 [ffj] = 0 0.9 0.25 0 0 0.3 andthecorrespondingunnormalized probabilities,withA=0.3andB= 3,are 1.11 0.875 0.0151 0.15 2.79 0.80 [)] 0.09 0.105 0.65 whicharenormalizedtobe 0.55 0.44 0.01 0.04 0.75 0.21 0.11 0.12 0.74

wfl

Soafter oneiterationthematchstructureisalreadystartingtoconvergetothebest match ofP= 1,Py = 0for /^ j . Note thatingeneral Pu and q,j are,in matrix form, sparseduetotheconsistencycondition (7.34).Toseetheresultsforanex ampleofamoreappropriatescale,consultFig.7.7.
7.3.2 Rigid Bodiesfrom Motion

The human visualsystem ispredisposed tointerpret (perceive) twodimensional projections of moving threedimensional rigid objects asjust thatmoving rigid objects. This facility isan interesting one, since it persists even when all three dimensionsinformation isremovedfrom anysinglestaticview.Thissortofresult has been known for some time [Wallach and O'Connell 1953;Johansson 1964]. The ability to interpret points as threedimensional objects demonstrated by Johansson means thatthe interpretation processdoesnotrelysolelyon monitor ing the changes of angles and length of lines, as suggested by Wallach and O'Connell. Ofcourse anychange between twotwodimensional projections ofpointsin three dimensionscanbeexplained byanynumber ofconfigurations andmotions. Ourvisualsystemonlyacceptsafewinterpretations,oftenonlyone.Thisoneis,in theworldofmovingobjectsinwhichwelive,usuallycorrect.Thisabilitytoreject unlikelyinterpretations isconsistentwitha"rigidityassumption" [Ullman1979]: Any set of elements undergoing atwodimensional transformation which hasa unique interpretation asarigidbody movinginspaceshould besointerpreted. It seemslikelythatsomethinglikethisrigidityassumptionisbuiltintoourvisualsys tem.However,sayingthatdoesnottellusmuchabouthowitcouldpossiblywork. Belowweconsidertheproblemofobtainingthreedimensionalstructurefrom sets ofcorrespondingtwodimensionalpoints. Onerelatedareaofworkisthereconstructionofthreedimensional structure whenthecorrespondingpointsintwodimensionsarenotknown.Thereconstruc
210
Ch.7 Motion

(a)

(b)

S
Fig. 7.7 Optical flow from feature point analyses, (a) An image, (b) Later image, (c) Opti calflow found by relaxation.

tionproceduremustbeginbymatchingpointsintheseveralviews.Itcanbeshown [Shapira 1974]thatgeneralwireframe objectsofstraightwires(ofwhichtheedges ofpolyhedraareonlyaspecialcase) maybereconstructed from afinite numberof perspective projections, but that for general wireframe objects, the number of projections needed may be quite large. In fact, given any set of projections (viewpoints and viewingplanes),anobject may beconstructed that isonlyambi guously specified bythose projections. Further work onreconstruction from pro jectionsisreportedin[ShapiraandFreeman 1978,WesleyandMarkovsky1981]. If point correspondences are known, it is possible to compute a unique
Sec. 7.3 Understanding Image Sequences 211

threedimensional location of four noncoplanar points from just three (ortho graphic) projections [Ullman 1979]. If the projections result from noncoplanar viewpoints, the recovery of threedimensional structure isstraightforward and is outlined below.Iftheprojectionsarefrom coplanar viewpoints,thecomputations become morecomplex butstillyieldauniqueresult uptoreflection. Thissecond case isan important one; it applies ifthe camera isstationary and the object re volves about asingle axis, for instance. Since the reconstruction is unique, the method never gets a wrong structure from accurate twodimensional evidence about arigid body.The probability that three viewsoffour nonrigidly connected pointscanbeinterpretated asarigidbodyisverylow.Thus,themethodisunlikely toreportstructurethatisnotthere. Themethod maybeheuristicallyextendedtomultipleobjects.Giventheca pabilityofdescribing the threedimensional structureoffour points,onecanseg ment largecollections ofpointsbytreating them ingroupsoffour, deriving their structure and hence their motion. Groups ofpoints thatarenotrigid haveavery lowprobability of being interpreted asrigid, and the rest willpresumably cluster intosetsthatsharemotionsassociatedwithrigidobjectsintheimagedscene.Thus themethodtobedescribedmaybeadaptableforimagesegmentation. The calculation may be applied to coplanar points. If a unique result is derived, it iscorrect; otherwise, the fact that the pointsare coplanar is revealed. Generally,accuracyoftwodimensional positionalinformation canbesacrificed to somedegreeifmorepointsormoreviewsaresupplied.Perspectiveprojectionsare more difficult to analyze.Such views can easily be treated approximately by the techniqueofbreakingthemintofour elementgroupsandtreatingeachgroupasif it were orthographically projected in adirection depending on its position in the scene.Thusperspectivemaybedealtwithglobally, although eachgroupislocally treatedasanorthogonalprojection.Theassumptionoforthographicprojectionim pliesthat the method cannot recover relativedepth ofobjects.The method does notlenditselfwellto"structurefromrecedingmotion"inwhichthemotion infor mation islargely encoded intheperspectiveeffects which render objectslargeror smallerastheyadvanceandrecede.Themethoddoesnotservewelltoexplainhu man performance onmoving imagesofafew pointsonnonrigid objects (suchas thoseinSection7.3.3). Assume that three orthographic projections offour noncoplanar points are given,andthatthecorrespondence betweenthepointsintheprojection isknown. Translational motion perpendicular to a projection plane is unrecoverable, and translation inaplaneparalleltotheprojection planeisexplicitlyreproducedinthe imagebytheprojection process. Theproblemthuseasilyreducestothecasethat oneofthepointsischosenastheoriginofcoordinates,andstaysfixedthroughout theprocess.Thistreatmentfollowsthatof[Ullman1979]. Letthefour pointsbe0,A, B,and C.Threeorthographicviews, projections onsomeplanesII],Yl2,andn 3 ,aretheinputtotheprocess.Acoordinatesystemis chosen withorigin at0,and a, b,andcarevectors from 0toA, B,and C. Then eachviewhasatwodimensional coordinatesystemwith theimageof0atitsori gin.Letp,andq,betheorthogonalunitbasisvectorsofthecoordinatesystemsof then,. Let theimagecoordinatesofA, B,and Con II,be GcU,),yia,)), (xib,),
212 Ch.7 Motion

y{b[)\ and Gc(c,),j>(c,)) for / = 1, 2, 3.The calculations produce vectors u,,, whichareunitvectorsalongthelinesofintersectionofII,withUj. Theimagecoordinatesareinfact x(<i/) a p , x(6,) = bp, X(Q) =cp, _y(a,) = aq, .y(6,) = bq, ytc,) =cq, (7.38)

Theunitvectoruyisonbothn, andFI7;henceforsomer#,Sy,tytandv,y, v = tyP/ + syq, rj + sg1


u =

(7.39) tyty (7.40)

u fyPy+

^ + vJ= 1 Equations (7.39)and(7.40)yield Taking the scalar product ofa, b,and cwith Eq. (7.41) yieldsthree more equa tions,whicharelinearlyindependent. Theseequations in/,,,Sy,ty,and v,,,com bined with Eqs. (7.39) and (7.40), yield twosolutions differing only in sign.But this meansthat (uptoasign) u,,isdetermined in terms ofthe image coordinate basisvectors (p q,) and (p,,qy).Twouvectors determine oneofthe planesof orthogonalprojection.Forinstance,u^andu23lieinPT,.Giventheplaneequation for the n,, the threedimensional locations are computed as the intersection of lines perpendicular to the n, and through the twodimensional image points.Of course, becauseoftheambiguity insign,theexpected mirror imageambiguityof structureexists. Theextension tothe casethatu12 = u23 = u31, wherethe three viewpoints arecoplanar,isnotdifficult. Itisperhapsalittlesurprisingthatcoplanarviewpoints stillyieldauniqueinterpretation. Anextensionofthemathematicstoperspectiveimagingisnotdifficult tofor mulate,buttheequationsarenonlinearandmustbesolvedeither conventionally, say by the multidimensional NewtonRaphson technique of Appendix 1, or perhaps bycooperative algorithms ofamoreartificial intelligence flavor [Lawton 1981]. In geometrically underconstrained situations, plausible interpretations can sometimes be made byusingother knowledge togiveconstraints. Forexample, onecan minimizeaseconddifference approximation totheacceleration ofpoints inordertousethe"constraint"ofsmoothmotion.Suchacriterionmayfindasin gle"best" locationforpoints.Anotherexampleistheuseofpositionandvelocity commonality over time to establish rigid members in linkages (Section 7.3.3), a firststeptolocationdetermination. Toseehowtheequationsmightbesetup,considertheperspectivegeometry ofSection7.2.1.Inthissimplified Cartesian system, Eqs.(7.1) and (7.2) areused asbefore.Sincez(x', y', 1) = (x,y, z),thelocationofanypointisdetermined (up
Sec. 7.3 Understanding ImageSequences 213

toascalefactor, sincethefocal lengthisnot explicit) from itsimage coordinates and itsdepthcoordinate, z.ForF> 1imagesand N > 3pointsthere areFN1 unknowns (theabilitytoscaledistanceallowsonepointtobeplacedarbitrarily). Toapplytherigidbodyconstraint, enoughpairwisedistancesbetween points must bespecified to lock them intoarigid configuration. For three points, three distances are necessary. Each additional point requires another three distances, and sofor each interframe interval 3(iV 2) constraintsareneeded, for atotalof 3(F 1) GV 2)constraints.Thus,whenever 2FN 6F3N +1 > 0 (7.42) consistent equations from theconstraintscanbesolved [Lawton 1981].Withtwo views,fivepointsareneeded;withthreeviews,four points.Thisisnotsurprising, giventheprecedinganalysisfororthographicprojections. Consider thesimplecaseoftwo pointsseenintwoframes. Iftheyarerigidly connected,oneconstraintequationholds.Itis equivalentto (x n x 1 2 H x n x12) = (x2i x22)(x2] x22) (7.43) (x, x'ij are, respectively, the world and image coordinate vectors of point j in frame (/). SinceXy = Zyx'y, (recall(7.1)and (7.2))theconstraintbecomes z\\ (x'nx'n) + zfc(x'12x'12) 2znzi2(x,11x'12) zlx(x'21x'21) z222(x'22x'22) + 2Z21Z22(X'21X'22) = 0 (7.44) Afurther constraint that objects only move in the "ground plane," orata constanty, hasthe effect ofremovingtwounknowns through substitution in the constraintequationabove. Sinceforarbitrary mandn,

y m = Zimy'im = ym =ziny'i
in

(7.45)

Asafinalexample, arestriction to purely translational motion ofthe point configurationsyieldstheconstraint (xn x2!) ( x 1 2 x 2 2 ) = 0 (7.47) Expanding thisastheproduct ofunknown depths (z)andknownimagepositions (xO yields avector equation that may be written componentwise asthree linear equations in four unknowns. Recall that afocal length must be fixed, effectively settingoneunknown:settingoneZyto 1givesasystemofthreelinearequationsin theotherthreezy.
7.3.3 Interpretation of MovingLight DisplaysA DomainIndependent Approach

One ofthe domains that provides thepurest aspects ofmotion vision ismoving light displays (MLDs). These are sequences of images which track only a few discretepointsperframe. AtypicalwaytoproduceanMLDistoattachsmallglass bead reflectors toaperson's majorjoints (shoulders, elbows,wrists, hips,knees,
214
Ch. 7 Motion

ankles),focusastronglightonhimorher,andmanipulatethecontrastofavideo tape recorder so as to produce on videotape a record of the movement of the reflective pointsonthejoints.Asingleframefrom sucharecordisunrecognizable byaninexperiencedsubject (Fig.7.8). However, asequence ofsuch frames quicklygives (typically in0.4 second) not onlyacompelling perception ofmotion ofathreedimensional body, butal lowsrecognition ofthe sequenceasdepictingawalkingperson, andadescription of the type of motion (walking backward,jumping, walking left). Complicated scenes such asseveral independently moving bodiesand couples dancingcan be recognized.Sophisticatedjudgmentscanbemade,suchasdeterminingthesexofa subjectfromanMLD,orrecognizingthegaitofafriend [Johannson1964]. MLDs thus present quite achallenge to computer vision. It could be that MLDs of moving people are interpreted by specialized neural mechanisms ex presslytailored tothepurposeofdealingwithanyvisualinputwhateverthatsug

* " %. Frame9 Frame13

Frame 1

Frame5

I*/.
Frame17 Frame21

V*
>F

y
Frame29

Frame25

*
*.* * i " '

. /

Frame33

Frame37

Frame41

Frame45

* *

Frame49

Frame53

Frame57

Frame61

Fig. 7.8 AnMLDforamanwalkinghisdog.


Understanding Image Sequences

215

gestsmovingpeople.MLDscertainlydemonstrate thattexture,continuousfields offlow,andespeciallythattheinterpretabilityofstaticversionsofthescenearenot necessaryforhumanbeingstodocomplexperceptionofcertain threedimensional objects. Thissection isconcerned withMLDsofmoving human beings,andthein terpretationwedesireconsistsofseparatingimagesofindividuals,inderivingtheir "connectivity" (i.e., the rigid links that connect the points), and possibly in describingthethreedimensionalmotioninwhichthesubjectsareengaged. MLDsproducedwithperspectiveprojection havefewofthepleasantproper tiesoftherigidorthographicprojectionwhichwereusedinSection7.3.1.Inpartic ular,bothtranslatingandrotatingobjectsareinherentlyambiguousinperspective projections [Roache and Aggarwal 1979].The approximate method outlined in Section 7.3.1,inwhichlocalgroupsoffour pointsareconsidered rigidandortho graphicallyprojected, fails for MLDsofwalkingpeople.Inmanyapplications,di gitization error willlimit severely the accuracy returned. Worse, in atypical 12 pointMLDofamoving person,thereisneverarigidsystemoffour noncoplanar points.Thesmalldeparturesfromrigidityoccurringin30msofnormalwalkingare enoughtorendertherigidityassumptionsinvalid[Rashid1980]. Analgorithm in [Badler 1975]extractsthetrajectory oftwomovingpointsif theymoveinparallelpathsandareviewedbysphericalprojection.The projection conditionsareapproximately metintypicalmovingperson MLDs,butthelackof pointsmovinginparallelpathsisenoughtorenderthealgorithminapplicable. A good start in the interpretation of MLDs involves solving the point correspondence problem between frames. Knowinghowpointsmovefrom frame to frame gives at least astart on perceiving the continuity of the objects in the scene.Solvingthisproblemfromframetoframe maybeattackedinanynumberof ways;therelaxationapproachofSection7.2.3isanexample. Another isto predict the location ofapoint in the twodimensional image from its velocity in the preceding frame. Velocity is computed from the differences inpositionofthepointintheprecedingtwoframes.Predictingwherea pointwillbeinframe 3impliesthatoneknewwhichpointitwasinframes 1 and2. Onewayofgettingtheprocessstartedistoassociatepointsinframes 1 and2that arenearestneighbors.Evidencesuggeststhathuman beingsinfactarenot infalli bletrackersofpointsinMLDs [Rashid 1980].However, they donot letlocalin consistencies in point interpretation (say, ifthe ankle momentarily "turns into" theknee) detract from theiroverallperception ofamovingperson.Thisisagood exampleofhowinconsistentinterpretationsariseinhumanvision. Aprogramcanbegivensimilarresiliencebyhavingitsuspendjudgment on contradictory clues and use succeeding frames to resolve the problem [Rashid 1980;O'Rourke 1980].Havingestablished localpointcorrespondences, the next problem is to group the points into coherent threedimensional structures and separateindividualbodiesmovinginthescene.Whenconstraintsonthesceneare availablethatmakeanalytictechniquesapplicable (Section7.3.1),explicitgroup ing of points prior to analysis may beunnecessary. In fact, with complex MLDs such asUllmanstudied (e.g.twotransparent butspottycoaxialcylindersrotating in opposite directions about an axis in the viewing plane), most naive grouping
216
Ch. 7 Motion

strategies based on twodimensional motion in the image will fail. Ullman's method chooses fourtuples of pointsfrom such ascene; on the average seven eighthsofsuchgroups involvepointsfrom bothcylinders,butwithaccuratedata thealgorithmcanidentify suchnonrigidfourtuples.Theremainingoneeighthof thegroupshaveconsistentinterpretationsasrigidrotatinggroups,andthegroups fallintotwoclasses,oneforeachcylinder. One straightforward heuristic approach to MLD interpretation enjoys moderate successand doesnot usedomaindependent models [Rashid 1980].It has the characteristic that it deals exclusively with twodimensional motions in ordertoextractinformation aboutthreedimensions.Theapproachismoreheuris tic than Lawton's and certainly more than Ullman's (Section7.3.1). Itisprey to many of the same pitfalls that threaten any imagebased (as opposed to world based) approachtocomputervision. WithsparseMLDsofnonrigidobjects,clus tering algorithms may be used to group points into related structures. Rashid's method computes the minimum spanning tree of points in a fourdimensional space of twodimensional position and twodimensional velocity. That is, each pointintheMLDisrepresentedatanytimetbyafourvector

(x(r), v(r), u(t), v(0)


where u and vare the velocity in image x and y coordinates. Points may be clustered in this positionvelocity space on the basis of a fourdimensional Euclideanmetric,modified byinformation aboutdistancesderivedfrom preceding frames. Perspective distortion can affect the usefulness of twodimensional dis tancescomputed in previous frames, anddatascalingisuseful toestablisharea sonablerelation between unitsinthefourdimensional space.Rashid'stechnique istoscalethedataineachdimension tohaveunitvarianceandzeromean,andto computecumulativedistancesbetweenpointsinaframebyafunction suchas Dn(i, j) = d(i, j) + _,(/,j) x 0.95 (7.48)

where D(i,j) isthecumulative distance between points /andj in frame ,and d(i, j) istheirEuclideandistance. Thisclusteringmethodcansuccessfully grouppointsonthetwocylindersin the rotatingcylinder sequence mentioned above after seven frames. Figure 7.9 givestheresultsofclusteringthedatafortheMLDofFig.7.8.Clusteringisstable aftersome25frames (aboutonehalfofastep).
7.3.4 HumanMotion UnderstandingA ModelDirected Approach

Human motion understanding maybedonewith amuch different approach than theheuristicclusteringappliedtoMLDsinSection7.3.3.Averydetailedmodelof thedomaincanhelprestrict search, makeinferences, disambiguateclues,andso forth. A program for understanding images of human motion successfully uses suchanapproach [O'Rourke1980;O'RourkeandBadler1980]. Thebodymodel accounts for suchfactors asrelative location ofbodyparts, jointangleranges,jointangleaccelerationlimits,collisionchecking,andgravity.A motion simulation program drives a "bubble man" representation of a person
Sec. 7.3 Understanding ImageSequences 217

MST for frame: 7

MST for frame: 12

MST for frame: 27

MST for frame: 32

MST for frame: 17

MSTfor frame: 22

MST for frame: 37

MSTfor frame: 42

Fig. 7.9 Theminimalspanningtreeforthemananddog.

(Fig.7.10a) [Badler and Smoliar 1979].Thisrepresentation is used to producea shadedgraphicrendition which servesasinput tothe motion understandingpro gram (Fig.7.10b).Knowledge oftheimagingprocessalsoprovidesconstraintson the configuration of the figure represented. For instance, perspective, the figure/ground distinction, thelocationoffeatures, andocclusionallhaveimplica tionsfortheinterpretationofthesceneasaconfiguration ofthemodel. Thesystemisanotherexampleofacooperative,constraintsatisfying system (Chapter 12),this time onethat involvesahighlevel domaindependent model.

(b)

(0

(d)

Fig. 7.10 Understanding human motion through the incorporation of many constraints, (a)BubbleManfrom simulationprogram, (b)Input tomotion under stander;abowingman. (c,d) Initialandfinalstagesinunderstanding themotion ofthebowingman.
Sec. 7.3 Understanding Image Sequences 219

Theconstraints imposed bythe model restrict theapplication oflowlevelopera tors, and their results reduce uncertainty in parts of the model configuration. Through therelationsbetweenmodelparts,improvedestimatesforpartlocations areevolvedandpropagatethroughoutthemodel.Figure7.10canddshowhowthe imageofthe bowingmanisunderstood moreaccuratelyastimepassesand more constraints are propagated through the model. It should be noted that only the hand, foot, and head features areexplicitlysearched for inthe image.The boxes representpossiblelocationsfortheobviousbodyparts.Notehowtheocclusionhas beenunderstood. 7.3.5 SegmentedImages MovingPolygonsandLineDrawings Asonestepalongthewaytomotionunderstanding, theanalysisofidealpo lygonalimageswaspopular foratime [AggarwalandDuda 1975;MartinandAg garwal 1978; Potter 1975]. The assumptions are usually that opaque polygons move in parallel planes and mayobscure one another (this isoften called a2.5 dimensional situation). The viewpoint is somewhere "above" the collection of movingshapes.The viewer (program) ispresented withasequenceofframes ei theroflinedrawingsorgraylevelimagesofthescene(Fig.7.11).Polygonmotion isassumed smallbetween frames. Thegoalisusually tosegment thescenes into polygons,andto extract suchinformation astheir direction and speedofmotion. Thesolutionstotheseproblemsusuallyreflectassumptionsabouttheconnectivity ofthepolygons,orrestrictionsontheirmotion,andoften revolveabouttheallow able topological and geometrical transformations that can take place in such scenes. For instance, inaframe withtwopolygonssuch asthat shown inFig.7.12, certain scene vertices belong to primitive polyhedra (they are "true" vertices), whereasothersare "false" artifactsofocclusion.Thelinesimpingingattruever ticeswillnot change their angleofmeeting through time, but false verticesmay changeanglesifthe polygonsrotate asthey move. Falsevertices areusuallyob tuse. Complexconnectivitychangescanarisewhennonconvexpolygonsslidepast oneanother. Sorting outacoherent interpretation ofasequenceofframes, espe ciallyinthepresenceofnoisyvertexpositions,isachallengingexercise. Asystemwasdesigned in [Badler 1975]whichusedsequencesoflinedraw ingsproducedbyasphericalprojection ofathreedimensionalworldtoreconstruct

Fig. 7.11 Twoframesfrom amotionimageofthreemovingpolygons.


220
Ch. 7 Motion

Fig. 7.12 True (T) and False (F) vertices in a scene of two overlapping pol ygons.

somethreedimensional aspectsoftheinput, and totransform the pictorial input intonaturallanguagedescriptionsofmotion. SimilarityAnalysis, ThenDifferenceMeasurement Thisapproach isprobably themostintuitiveifmotion perception isthought tobebuiltupfromperceptionofsuccessiveframes.Theideaissimplytoextractan object in one frame, and tosearch for it inthe next frame. Obviously, the basic techniquesherearethedescriptionextractionprocess(i.e.,staticcomputervision, thetopicofmostofthisbook)andmatching (Chapter11). The entire range ofmatching techniques, from image matching todescrip tion matching, has been applied to image sequences. One characteristic of this approach in its pure form is that motion ismerely anuisancesegmentation is performed withoutusingmotioninformation. Usuallytheapproachispursuedina morepragmaticanddomaindependentfashion:forinstance,thematchingmaybe guidedbyknowledgeaboutthemotions. One advanced system that uses this basic paradigm is described in [Price 1976;Price 1978;PriceandReddy 1977]. Itsegmentsanddescribes both images first.Using thesymbolicdescriptions, itmatchescomplexscenes (suchashouses oraerial images) that have been relatively rotated bylargeamounts (45to 180) andhavesizedifferences aswell. Italsoderivesthegeometrictransformation that producedthesecondimagefrom the first. Clearly,themajor problemsinsystemsofthissortcomefrom generatingand matchingdescriptions.Thematchingmustbesophisticated,andtobesuccessful in generalitmustcombinesymbolicandgeometriccomponents.Theconstraintthat successiveframes donotreflect violentmotionseasesthematchingproblemcon siderably,andiconiccorrelationtechniquesmaysometimesapply. DifferenceMeasurement, ThenSimilarityAnalysis Theideabehindthisapproachistoguidethesimilarityanalysiswithinforma tionaboutimagedifferences. Thisseemsapromisingidea,becausedifferences are easytocompute,whereastheverydefinition ofsimilarityisopentoquestion,and computingitmaybearbitrarilycomplex.
Sec. 7.3 Understanding Image Sequences

221

Inparticular, inlocatingmovingobjectsinanimagesequence,oneisinvited to ignore the stationary background. The areaofchanging imagecan betracked easilyfromimagetoimage,andsubjectedtofurther analysis.Ratherthantryingto trackanobject from image toimage, itisattractive toconsider letting the object movefarenoughthatitdoesnotoverlapbetweentwoimages.Thenthe difference betweentheimageswillactuallyreflectthestructureoftheobject. Onepossiblemethod [Nagel 1978a, 1978b;JainandNagel1978]proceedsas follows: 1. Obtain twoimagesfrom the motion sequencesuch thattheobject ofinterest willhavemovedfarenoughnottooverlapinpositioninthetwoimages.(One clearlyneedsinformation abouttheobjectsandtheimagingparameterstoas surenooverlap.) 2. Segmentthetwoimagesintoregions. 3. Computeadissimilarity measurebetweentheoverlappingareasofregionsin thetwoimages.Onereasonablemeasureisthelikelihoodratioforthetwohy pothesesthat the intensities intheoverlapscomefrom thesame distribution ofintensitiesorfromdifferent distributions. 4. In one of the images, take all regions that are most consistent with the hy pothesisofdifferent distributionsandassumethattheyarisefrom themoving object (oritsoldvacatedposition).Mergetheseregionsbyareasonabletech niqueintoonewhichistakentoincludethemovingobject. 5. Taketheboundary ofthecandidateregionanduseitasatemplateforcorrela tiondetectiontrackingbetweenadjacent frames. 6. The offsets revealed by the correlation processgive the velocity, and can be used to "subtract out" themotion, register theviewsoftheobjectinseveral images,andthusobtainamoreaccuratecharacterizationoftheobject. ThisapproachleadstoresultssuchasthoseshowninFig.7.13.

EXERCISES 7.1 WriteageometricexplanationoftheFOEphenomenon. 7.2 Devise a motion segmentation scheme for rigid bodies in translational three dimensionalmotionthatusestheFOEcalculation. f p 7.3 Provethattheparametric low athequation (7.3)indeeddoesproduceastraight lineinimagecoordinates. 7.4 Provethe timetoadjacency relation (7.5).Ageometrical demonstration maybe madewithsimilartriangles;analgebraiconeisnotveryhard. 7.5 ExpressEq. (7.12)asmuchaspossibleintermsofobservablesintheoptical flow "image."Whatisleft unspecified? 7.6 PerformExercise7.5withequation(7.13).
222
Ch.7 Motion

3^^"*^^*"
^L^'i^iB

g^ggjIC, fM U S ^

0**
JUBimilrtf"

pi*'??*"
^ r i v j 1 ^UEk^'^ts!'

5
,$J$*

H P ,

B '' L _.... >^^

yj

r^'^BH^

(d)

Fig. 7.13 Motion from segmented images. Initial (a)andfinal(b) frames from 16frame sequence.Theobjectofinterestisthecarmovingleft torightintheintersection, (c)Carseg mentation from anintermediateframe, (d)Carreconstructed from several frames;thegray valuesresultfrom aligningthevaluesextractedfrom individualframesbysegmentation.

7.7 Specialize the result ofExercise 7.6 to thecase that the observer ismoving in the directionofhisdirectionofview[theFOEisat (0,0)]. 7.8 Fillin the steps in thederivations of thegeneral and special cases of8and e (Eqs. (7.18)and (7.21)through (7.23)). 7.9 Fillinthestepsinthederivationsoftanoandtanr (Eqs.(7.28)and (7.29)). 7.10 Showhowtocomputeabsolutedepth fromflow(Section7.2.2) iftheobserverspeed isknown. 7.11 The Laplacian ofe inSection 7.2.3 isthesum ofthesecond partialderivatives ofe
Exercises

223

7.12

7.13 7.14 7.15

withrespectto0and<j>.Writeitoutandshowthatithassingularitiesonlywhenthe Laplacianofdepth (/)doesexceptat<f> =0ornorr=0. In Section 7.2.2, the 9,$ system isdivorced from theretinal position. Howmight thiscoordinatesystembededucedfrom opticalflow,orhowmightthisdeductionbe unnecessary? Workoutthedetailsofthevector equationreferred tointhelastparagraph ofSec tion7.3.2. Whatdoflowpathslooklikeiftheobserver (ortheenvironment) onlyexecutesro tationalmotion?Pickacongenialcoordinatesystemandproveyoursupposition. Tightenupthe "common motion" heuristicinSection7.1.2. Whatdomains under what sorts ofworld motion yield what sorts of "common" image motions for ob jects? REFERENCES

AGGARWAL, J. K. and R. O. DUDA. "Computer analysis of moving polygonal images." IEEE Trans. Computers24,1975, 966976. AGGARWAL, J. K., L. S. DAVIS, and W. N. MARTIN. "Correspondence processes in dynamic scene analysis."Proc.IEEE 69,5,May 1981,562571. BADLER, N. "Temporal scene analysis:conceptual descriptions of object movements." Technical Re port 80,Dept. ofComputer Science,Univ.Toronto, February 1975. BADLER, N. I. and S.W. SMOLIAR. "Digital representations of human movement." Computing Surveys II, 1,1938, March 1979. BARNARD, S. T. and W. B. THOMPSON. "Disparity analysis of images." Technical Report 791, Com puterScienceDept., Univ.Minnesota, January 1979. BRAUNSTEIN, M.L.DepthPerceptionthroughMotion.New York:AcademicPress, 1976. CLOCKSIN, W. F. "Computer prediction ofvisual thresholds for surface slant and edge detection from opticalflow fields." Ph.D.dissertation, Univ.Edinburgh, 1980. DUDA, R.O.and P.E. HART. PatternRecognitionandSceneAnalysis. New York:Wiley, 1973. GIBSON,J.J. ThePerceptionofthe VisualWorld.Boston:Houghton Mifflin, 1950. GIBSON, J. J. "Continuous perspective transformations and the perception of rigid motion." J. Experi mentalPsychology54, 1957,129138. GIBSON, J. J. "Research on the visual perception of motion and change." In Readings in theStudy of VisuallyPerceivedMovement, Irwin M.Spigel (Ed.).New York:Harper &Row, 1965. GIBSON,J.J. TheEcologicalApproachto VisualPerception. Ithaca, NY:Cornell University Press, 1966. HELMHOLTZ, H.VON. TreatiseonPhysiologicalOptics(translated byJ. P.C.Southall).New York: Dover Publications, 1925. HORN, B. K. P and B.G. SCHUNCK. "Determining optical flow." AI Memo 572, AI Lab, MIT, April 1980. JAIN, R. and H.H. NAGEL. "On a motion analysis process for image sequences from real world scenes." Proc, IEEE Workshop on Pattern Recognition and Artificial Intelligence, Princeton, NJ, 1978. JOHANSSON, G. "Perception of motion and changing form." Scandinavian J. Psychology 5, 1964, 181208. LAWTON, D.T. "The processing ofdynamic imagesand thecontrol ofrobot behavior." Ph.D. disserta tion, Univ. Massachusetts, 1981. LEE, D.N.and J.R. LISHMAN. "Visualproprioceptivecontrol ofstance."/ . Human Movement Studies I, 1975,8795. 224 Ch. 7 Motion

MARTIN, W.N.,andJ.K.AGGARWAL. "Dynamic sceneanalysis." CGIP 7,1978,356374. MORAVEC,H.P."Towards automaticvisualobstacleavoidance."Proc, 5th1JCAI,August 1977,584. NAGEL, H.H."Formation ofanobjectconcept byanalysisofsystematictime variationsintheoptically perceptibleenvironment." CGIP 7,2,June 1978a, 149194. NAGEL, H.H."Analysis techniques for image sequences." Proc, 4th IJCPR, November 1978b, 186211. NAKAYAMA, K.andJ.M.LOOMIS. "Optical velocity patterns, velocity sensitive neurons,andspace per ception." Perception3, 1974,6380. O'ROURKE, J."Image analysisofhuman motion." Ph.D. dissertation, The MooreSchool ofElectrical Engineering, Univ.Pennsylvania, 1980. O'ROURKE, J.andN.I.BADLER. "Modelbased image analysisofhuman motion usingconstraint pro pagation." IEEE Trans.PAMI2, 4,November 1980. POTTER,J.L. "Velocityasacuetosegmentation." IEEE Trans. SMC5, 1975,390394. PRAGER, J.M. "Segmentation ofstatic anddynamic scenes." COINS Technical Report 797, Com puterand Information Science,Univ.Massachusetts, May 1979. PRAZDNY, K."Egomotion andrelative depth map from optical flow." Computer Science Dept., Univ. Essex, March 1979. PRICE, K.E."Change detection andanalysis in multispectral images." Ph.D. dissertation, Dept.of Computer Science,CarnegieMellon Univ., 1976. PRICE, K.E."Symbolic matching andanalysis with substantial changes inorientation." Proc, IEEE Workshopon Pattern Recognition andArtificial Intelligence, Princeton, NJ,1978. PRICE, K.E.,andR. REDDY. "Change detection andanalysis in multispectral images." Proc, 5th IJCAI,August 1977,619625. RASHID, R.F. "LIGHTS: a system for interpretation of moving light displays." Ph.D. dissertation, Computer ScienceDept., Univ.Rochester, April 1980. ROACHE, J.W.andJ.K.AGGARWAL. "Ontheambiguity ofthreedimensional analysisofa moving ob ject from itsimages." IEEE WorkshoponComputer Analysis ofTimeVarying Imagery, April 1979. ROGERS, B.andM.GRAHAM. "Motion parallax asanindependent cuefordepth." Perception8, 1979, 125134. SCHIFF, W." T h eperception ofimpending collision: Astudy ofvisually directed avoidant behavior." PsychologicalMonographs 79,1965. SHAPIRA, R."A technique forthe reconstruction ofa straightedge, wireframe object from twoor morecentral projections." CGIP3,4,December 1974,318326. SHAPIRA, R.andH.FREEMAN. "Computer description ofbodies bounded byquadratic surfaces from a setofimportant projections." IEEE Trans. Computers27,9,September 1978,841854. SNYDER, W.E.(ed.)."Computer analysisoftime varying images," IEEE Computer 14,8,August1981. THOMPSON, W.B."Combining motion andcontrast forsegmentation." Technical Report 797, Com puterScienceDept., Univ.Minnesota, March 1979.
TSOTSOS, J. K., J. MYLOPOULOS, H. D. COVVEY and S. W. ZUCKER. " A framework for visual motion

understanding." IEEE Trans. PAMI 2,6,November 1980,563573. ULLMAN, S.TheInterpretationofVisualMotion (Ph.D.dissertation).Cambridge, MA:MITPress, 1979. WALLACH, H.andD.N. O'CONNELL. "The kineticdepth effect." J.ExperimentalPsychology45,4,1953, 205217. WESLEY, M.A.andG. MARKOVSKY. "Fleshing outprojections," Research Rpt.RC8884, Computer SciencesDept., IBM,T.J.Watson Research Center, April1981.

References

225

GEOMETRICAL STRUCTURES

Ultimately, oneofthe most important things tobedetermined from an imageis theshapeofthe objects in it. Shape isan intrinsic property of threedimensional objects; in asense it isthe primal intrinsic property for the vision system, from which manyothers (surface normals,object boundaries) canbederived. Itispri malinthesensethatweassociatethedefinitions ofobjectswithshape,ratherthan withcolororreflectivity,forexample. Webster defines shapeas"that quality ofan [object] which depends onthe relative position of all points composing its outline or external surface." This definition emphasizes the fact that weare aware ofshapes through outlines and surfacesofobjects,bothofwhichmaybevisuallyperceived.Italsomakesthedis tinction betweenthetwodimensional outlineandthethreedimensional surface. We preserve this distinction: Chapter 8 deals with two dimensional shapes, Chapter9withthreedimensionalshapes. Ifourgoalistounderstandflatimages,whybringsolidsinto consideration? Oursimpleansweristhatwebelieveinmanycasesvisionwithouta"solid basis" isapractical impossibility.Muchoftherecenthistoryofcomputer visiondemon stratestheadvantagesthatcanbegainedbyacknowledgingthe threedimensional worldofobjects.Theappearanceofobjectsinimagesmaybeunderstoodbyunder standing the physicsofobjects and the imagingprocess.The purest form oftwo dimensionalrecognition,templatematching,clearlydoesnotpracticallyextendto aworldwhereobjectsappearinarbitrary positions,much lesstoaworldofnonri gidobjects.Itistruethatinsomeimportantimageunderstandingtasks (interpreta tion of chest radiographs, ERTS images or some microscope slides), the third dimension isirrelevant. But where the threedimensionality of objects is impor tant, the considerable effort necessary to develop a usable threedimensional modelwillalwaysbeamplyrepaid. Shape recognition is doubtless one of the most important facilities of the mammalian visualsystem.Wehaveseenhowimportantshapeinformation canbe
228 Part III Geometrical Structures

extracted from images in early processing and segmentation. One of the major challenges tocomputer vision isto represent shapes, orthe important aspectsof shapes,sothat they maybelearned, matchedagainst,recollected, andused.This effort ishamperedbyseveralfactors. 1. Shapesareoftencomplex. Whereas color, motion, and intensity are relatively simply quantified byafew wellunderstood parameters, shape ismuch more subtle. Commonmanufactured ornaturalshapesareincrediblycomplex;they mayberepresented "explicitly" (saybyrepresenting theirsurface) onlywith hundreds of parameters. Worse, it is not clear what aspects of shapes are important for applications such as recognition. An explicit and complete representation may be computationally intractable for such basic uses as matching.What"shapefeatures"canbeusedtoeasetheburdenofcomputa tionwithcomplexshapes? 2. Introspection isnohelp. Human beings seem to have alarge fraction oftheir brainsdevoted tothesingletaskofshaperecognition.Thisimportantactivity is largely "wired in" at a level below our conscious introspection. Why is shape recognition soeasy for human beings and shape description so hard? Thefactthatwehavenopreciselanguageforshapemayarguefortheinacces sibilityofourshapeprocessingalgorithmsordatastructures.Thislackofcog nitiveleverage isatrifle daunting,especiallywhen takenwiththecomplexity ofeverydayshapes. 3. Thereislittle classicalguidance. Mathematics traditionally has not concerned itself with shape. For instance, only recently has there been a mathematical definition of"rigid solid"thataccordswithourintuitionandofsetoperations on solids that preserve their solidity. The fact that such basic questions are onlynowbeingaddressed indicatesthatcomputersciencemustdomorethan encodesomealreadyexistingprovenideas.Thuswehavethenextpoint. 4. Thedisciplineisyoung. Untilveryrecently,humanbeingscommunicated about complex shapesmainly through words,gestures, and twodimensional draw ings. It was not until the advent of the digital computer that it became of interest to represent complex shapes so that they could be specified to the machine, manipulated, computed with, and represented as output graphics. Nogenerallyacceptedsinglerepresentation schemeisavailableforallshapes; several exist, each with its advantages and disadvantages. Algorithms for manipulating shapes (for example, for computing how to move asofa upa flightofstairs,orcomputingthevolumeofaspecified shape) aresurprisingly complex,andareresearchtopics.Often therepresentationsgoodforoneappli cation,suchasrecognition,arenotgoodforothercomputations. Itistheintention ofthispartofthebooktoindicatesomeofwhatisknown about the representation ofshape. Although the detailsofgeometric representa tions may be still under development, they are an essential part of our layered computer vision organization. They are moreabstract than segmented structures and aredistinguished from relational structures bytheir preponderance ofmetric information.
Part III Geometrical Structures

229

Representationof TwoDimensional GeometricStructures

8.1 TWODIMENSIONAL GEOMETRIC STRUCTURES

Thestructuresofthischapteraretheintuitiveonesofwellbehavedplanarregions andcurves.Amathematicalcharacterization ofthesestructuresthatbars"patho logical"cases (suchasregionsofasinglepointandspacefillingcurves)ispossible [Requicha 1977]. Basically the requirement is that regions be "homogeneously twodimensional" (contain no hanging or isolated structures of different dimensionsolids, lines or points). Similarly, curves should be homogeneously onedimensional.Thepropertyofregularity issometimesimportant;aregularset isonethatistheclosureofitsinterior (intherelevantoneortwodimensionalto pology). Intuitively, regularizingatwodimensional set (taking the closureofits interior)firstremoves anyhanging oneand zerodimensional parts,then covers the remainder withatight skin (Fig.8.1). In computer vision, often regionsand curvesarediscrete, beingdefined onarasterofpixelsoronanorthogonalgridof possibleprimitiveedgesegments.Itisfrequently convenient toassociateadirec tion withacurve,hence ordering thepointsalongitand defining portions of the planetoitsleftandright. The onedimensional closed curve that bounds awellbehaved region isan unambiguousrepresentation ofit;Section8.2dealswithrepresentationsofcurves and hence indirectly of regions. Section 8.3 deals with other unambiguous representationsofregionsthatarenotbasedontheboundary.Sometimesunambi guous representation is not the issue; it may be important to have qualitative description ofaregion (its size orshape,say).Section 8.4 presents several terse descriptivepropertiesforregions.

231

(d)

Expandedviewof neighborhood

(e)

Fig. 8.1 (a,b,c)areRegions;(d) (e)and(f)arenot. 8.2 BOUNDARY REPRESENTATIONS

8.2.1 Polylines

The"twopoint"formofalinesegment (seeAppendix1)extendseasilytothepo lyline, whichrepresents aconcatenation oflinesegmentsasalistofpoints.Thus thepointlistx b x2,x3representstheconcatenationofthelinesegmentsfromxjto x2andfrom x2to X3.Ifthefirst pointisthesameasthelast,aclosedboundaryis represented. Polylinescanapproximatemostuseful curvestoanydesireddegreeofaccu racy.Onemight think thereisoneobviouswaytoapproximateaboundary curve (orrawdata) withapolygonalline. Thisisnotso:manydifferent approaches are possible. Finding asatisfying polygonal approximation to agiven curve basically involves segmentation issues.The problem istofindcorners or breakpointsthat
2 3 2 Ch. 8 Representation of TwoDimensional Geometric Structures

yield the "best" polyline.Aswithregionbased segmentation schemes, the ideas here can be characterized by the concepts of mergingand splitting. Splitting and mergingschemesmaybecombined, especiallyiftheappropriatenumber oflinear segmentsisknownbeforehand. Fordetails,see[HorowitzandPavlidis1976]. Inamergingalgorithm,pointsalongacurve(possiblyinimagedata)arecon sidered inorder and accepted intoalinearsegment aslongasthey fit sufficiently well.Whentheydonot,anewsegmentisbegun.Theefficiency andcharacteristics oftheseschemesarequitevariable,andendlessvariationsonthegeneralideaare possible.Afewexamplesof"one pass"mergingschemesaregivenhere:explicit algorithmsareavailablein[Pavlidis1977]. If the boundary (represented on adiscrete grid) is known to be piecewise linear,itisspecified byitsbreakpoints.Tofindthem,onecanlookalongtheboun dary, monitoring the angle between twoline segments.Onesegment isbetween thecurrent pointandapointseveral pointsbackalongthe boundary;theotheris betweenthecurrentpointandoneseveralpointsforward.Whentheanglebetween thesesegmentsreachesamaximum oversomethreshold,abreakpointisdeclared at the current point. Thisscheme doesnot adjust breakpoint positions,and sois fast [Shirai1975]butworksbestforpiecewiselinearinputcurves. Tolerancebandsolutionsplaceapointoneithersideofthecurveatthemax imum allowable error distance, and thenfindthe longest piece ofthe curve that lies entirely between parallel lines through the two points [Tomek 1974].This method proceeds without breakpoint adjustment, and may not find the most economicalsetofsegments (Fig.8.2). Anapproximationofacurvewithapolylineofminimumlengthinerrorbyat mostapixelisgivenin [SklanskyandKibler 1976].Eachcurvepixelisconsidered a square and the resulting pixel structure isfourconnected. The approximation describestheshapeofanelasticthreadplacedinthepixelstructure (Fig.8.3).The

\i I _ ^ _ x

Fig. 8.2 Simple toleranceband solution (dotted lines). Better solution (solidlines).
Boundary Representations

233

N
Fig. 8.3 Minimum lengthpolyline.

l i

methodtendstohavedifficulties withcurvesthataresharprelativetothegridsize. Another scheme, [Roberts 1965] is to keep arunning leastsquarederror bestfit linecalculationfor pointsastheyaremerged intosegments [Appendix1]. Whentheresidual (error) ofapointgoesoversomethreshold ortheaccumulated errorforasegmentexceedsathreshold,anewsegmentisstarted. Difficulties arise here because theconcept ofabreakpoint isnonexistent; theyjust occuratthein tersectionsofthebestfitlines,andwithoutaphaseofadjustingthesetofpointsto befitbyeach line (analogous to breakpoint adjustment), they maynot beintui tivelyappealing. Generally,onepassmergingschemesdonotproducethemostsatisfyingpo lylines possible under all conditions. Part of the problem is that breakpoints are only introduced after the fit has deteriorated, usually indicating that an earlier breakpointwouldhavebeendesirable. Inasplittingscheme,segmentsaredivided (usuallyintotwoparts)aslongas theyfailsomefittingcondition [DudaandHart1973;Turner 1974].Algorithm 8.1 providesanexample.

Algorithm 8.1: CurveApproximation 1. GivenacurveasinFig.8.4a, drawastraightlinebetweenitsendpoints (Fig. 8.4b). 2. For every point on the curve, compute its perpendicular distance to the approximating (poly)line.Ifitiseverywherewithinsometolerance,exit. 3. Otherwise, pick the curve point farthest from the approximating (poly)line, makeitanewbreakpoint (Fig.8.4c) andreplacetherelevantsegmentofpoly linewithtwonewlinesegments. 4. Recursivelyapplythealgorithmtothetwonewsegments (Fig.8.4d). Astraightforward extensionisneededtodealwiththecaseofcurvesegments paralleltotheapproximatingoneatmaximumdistance (Fig.8.4e).
2 3 4 Ch. 8 Representation ol TwoDimensional Geometric Structures

(a)

(b)

(c)

(d)

(e)

Fig. 8.4 Stagesin the recursive linear segmenter (see text).

Theareaofapolygon mayeasily becomputed from itspolyline representa tion [Roberts 1965].Foraclosedpolylineofnpoints (x(/),j>(/)) /=0,...,n~ 1, labeledclockwisearoundapolygonalboundary,theareaofthepolygonis
! n\

z L (*/+itt xy#+i)

(8.1)

wheresubscriptcalculationsaremodulo .Thisformula canbeprovedbyconsid eringitasthesumof(signed)areasoftriangles,eachwithavertexattheorigin,or ofparallelogramsconstructed bydroppingperpendicularsfrom thepolylinepoints toanaxis.Thismethodspecializestochaincodes,whicharealimitingcaseofpoly lines. 8.2.2 ChainCodes Chaincodes[Freeman 1974]consistoflinesegments thatmustlieonafixed grid with a fixed set of possible orientations. This structure may be efficiently represented becauseoftheconstraintsonitsconstruction.Onlyastartingpointis represented byitslocation;theother pointsonacurvearerepresented bysucces sivedisplacements from grid point togrid pointalongthecurve.Sincethegridis uniform, direction is sufficient to characterize displacement. The grid is usually considered to befour oreight connected; directionsareassigned asin Fig.8.5, andeachdirectioncanberepresentedin2or3bits (ittakes18bitstorepresent the startingpointina512x 512image). Chain codes may be made positionindependent by ignoring the "start point." Iftheyrepresentclosed boundariesthey maybe"start point normalized" bychoosingthestartpointsothattheresultingsequenceofdirectioncodesforms
Sec. 8.2 Boundary Representations 235

an integer of minimum magnitude. These normalizations mayhelpin matching. Periodiccorrelation (Section3.2.1) canprovideameasureofchaincodesimilarity. Thechaincodeswithouttheirstartpointinformation areconsideredtobeperiodic functions of"arclength." (Here thearclength isjust thenumber ofstepsinthe chain code.) The correlation operationfindsthe (arclength) displacement of the functions at which they match upbest aswell asquantifying thegoodness ofthe match.Itcanbesensitivetoslightdifferences inthecode. The "derivative" of the chain code isuseful because it isinvariant under boundary rotation. The derivative (really afirstdifference mod 4or 8) issimply another sequence of numbers indicating the relative direction ofchain codeseg ments;thenumberoflefthandturnsofn/2 orW4neededtoachievethedirection ofthenextchainsegment. Chain codesarealsowellsuited for mergingofregions [Briceand Fennema 1970]using the data structure described in Section 5.4.1. However, the pleasant propertiesfor mergingdonotextend tounionandintersection.Chaincodeslend themselvestoefficient calculationofcertainparametersofthecurves,suchasarea. Algorithm8.2computestheareaenclosedbyafourneighbor chaincode. Algorithm8.2: ChainCodeArea Comment: For afourneighbor code (0:+x, 1:+y,2:x,3:v) surrounding a regioninacounterclockwisesense,withstartingpoint(x,y): beginChainArea; 1. area:= 0; 2. position :=y\ 3. Foreachelementofchaincode caseelementdirectionof begincase [0]area:'= area^position; [1]^position:=^position + 1; [2]area:= area + ^position; [3]/position := position 1; endcase; endCha\nArea; Tomergetworegionboundariesistoremoveanyboundarytheyshare,obtaininga boundary for the region resulting from gluingthe twoabutting regions together. AswesawinChapter5,thechaincodesforneighboringregionsarecloselyrelated attheircommon boundary,beingequalandoppositeinaclearlydefinedsense (for Nneighborchaincodes,onenumberisequaltotheotherplusNil moduloN(see Chapter 5).Thisproperty allowssuch sections tobeidentified readily, andeasily scissoredouttogiveanewmerged boundary.Aswithpolylines,itisnotimmedi atelyobviousfrom achaincodedboundaryandapointwhetherthepointiswithin theboundaryoroutside.Manyalgorithmsforusewithchaincoderepresentations maybefound in [Freeman 1974;GallusandNeurath1970].
236
Ch. 8 Representation of TwoDimensional Geometric Structures

(b) Chaincode: (c) Derivative:

1 10101030333032212322 1 1003131331300133031130

Fig. 8.5 (a)Directionnumbersforchaincodeelements, (b)Chaincodeforthe boundaryshown, (c)Derivativeof(b). 8.2.3 The/$Curve

Thei//scurveislikeacontinuousversion ofthechaincoderepresentation; itis the basisfor several measures ofshape, ip isthe anglemadebetween afixedline andatangenttotheboundaryofashape.Itisplottedagainsts,thearclengthofthe boundarytraversed.Foraclosedboundary,thefunction isperiodic,withadiscon tinuousjumpfrom 2TT backto0asthetangentreattainstheangleofthefixedline aftertraversingtheboundary. Horizontalstraightlinesinthe\fiscurvecorrespondtostraightlinesonthe boundary (// is not changing). Nonhorizontal straight lines correspond to seg ments ofcircles,since//ischangingataconstant rate.Thus theipscurveitself maybesegmentedintostraightlines [Ambleretal.1975],yieldingasegmentation oftheboundaryoftheshapeintermsofstraightlinesandcirculararcs(Fig.8.6).

L+s
(b|

(a) (0

Fig. 8.6 i/yssegmentation, (a)Triangular curveand atangent, (b)i/<scurveshowingre gionsofhighcurvature.(c)Resultant segmentation.


Sec.8.2 Boundary Representations 2 3 7

8.2.4 FourierDescriptors

Fourierdescriptorsrepresenttheboundaryofaregionasaperiodicfunction which canbeexpandedinaFourierseries. Thereareseveralpossibleparameterizations, summarizedin [PersoonandFu 1974].Thesefrequencydomain descriptionspro videanincreasinglyaccuratecharacterizationofshapeasmorecoefficients arein cluded. In the infinite limit, they are unambiguous; individual coefficients are descriptiverepresentationsindicating"lobedness"ofvariousdegrees. TheboundaryitselfmayprovidetheparametersfortheFouriertransformas shown in Fig.8.7.Theparameterization ofFig.8.7givesthe following seriesex pansions: xCp)= lXkeJkwl w0=2n/P, P =perimeter (8.2)

wherethediscreteFouriercoefficients X^aregivenby X* T? J xis)e'


Jkw0s

ds

(8.3)

Acommon feature for the Fourier descriptors isthat typically the general shapeisgivenratherwellbyafewoftheloworderterms intheexpansion ofthe boundarycurve.Properlyparameterized, thecoefficients areindependentofsize, translation,androtation oftheshapetobedescribed.Thedescriptorsdonotlend themselves well to reconstruction of the boundary; for one thing, the resulting curvemaynotbeclosedifonlyafinitenumberofcoefficients isusedfortherecon struction. The \\fs curve may be used as the basis for a Fourier transform shape description [Barrowand Popplestone 1971]. ijj(s) isconverted to</>(s):0(s) = ijj(s) 27r s/P. Thisoperation subtractsout therisingcomponent. Anumberof shapeindicating numbers arise from taking the rootmeansquare amplitudes of theFouriercomponentsof<f>(s),discardingphaseinformation.Theshapedescrip torsareagainindicativeofthe"lobedness"oftheshape.

( x , ( s ) , x 2 (s))

**i
2 3 8

Fig. 8.7 Parameterization forFourier SeriesExpansion.

Ch. 8 Representation of TwoDimensional Geometric Structures

8.2.5 ConicSections

Polynomialsareanaturalchoiceforcurverepresentation,andcertain polynomials of degree 2 (namely, circlesand ellipses) areclosed curvesand hence define re gions.Circlesmayberepresentedwiththreeparameters,ellipsesbyfive,andgen eral conies bysix.Thus the coefficients or parameters ofconicsectionsare terse representations. Conies are often good models for physical curves such as the edgesofmanufactured objects. Coniesarecommonlyusedtorepresentgeneralcurvesapproximately [Paton 1970].Conieshavesome annoying properties, however; an important oneisthe difficulty ofproducing awellbehaved conic from noisy data to be fitted. Unless oneiscarefulindefining theerrormeasure [Turner 1974],a"leastsquared error" fitofaconictodatapointsyieldsaconicwhichisanonintuitiveshapeorevenofa surprising type (such as a hyperbola when an ellipse was expected). Conic representationsandalgorithmsareexploredinAppendix1.
8.2.6 BSplines

Interpolative techniques may be used to yield approximate representations. B splines are a popular choice of piecewise polynomial interpolant. Introduced in computeraideddesignandcomputergraphics,theseclassesofcurvesprovideade quateaestheticcontentformuchdesignandalsohavemanyuseful analyticproper ties.Usually,thefactthatthecurvesare"interpolating"isnotveryrelevant.What isrelevant isthattheyhavepredictablepropertieswhichmakethemeasytomani pulate in image processing, that they "look good" to human beings, that they closelyapproximatecurvesofinterestinnature,andsoforth.Severalschemesex istforconstructingcomplexcurvesthatareuseful ingeometricmodeling,andde tailedexpositionsaretobefound in [deBoor 1978;BarnhillandRiesenfeld 1974]. The Bspline formulation isoneofthesimplest that stillhasproperties useful for interactivemodelingandtheextractionfrom rawdata. Bsplines are piecewisepolynomial curveswhich arerelated toaguidingpo lygon.Cubicpolynomialsarethemostfrequently usedforsplinessincetheyarethe lowestorderinwhichthecurvaturecanchangesign.Anexample oftherelation shipbetweentheguidingpolygonanditssplinecurveisshowninFig.8.8.Splines are useful in computer vision because they allowaccurate, manipulable internal models ofcomplex shapes. The models may be used to guideand monitor seg mentation andrecognition tasks.Interactivegeneration ofcomplexshapemodels ispossible withBsplines, and the fact that the complex splinecurveshave terse representations (as their guiding polygons) allows programs to manipulate them easily. Spline approximations have good computational properties aswellasgood representational ones. First, they are variation diminishing. This means that the curve isguaranteed to "vary less" than itsguiding polygon (many interpolation schemeshaveatendencytooscillatebetween samplepoints).Infact, thecurveis guaranteed toliebetween the convex hullofgroups ofn + 1consecutive points wherenisthedegreeoftheinterpolatingpolynomial (Fig.8.9.) Thesecondadvan
2 Boundary Representations 2 3 9

Fig. 8.8 Asplinecurveanditsguiding polygon.

tageisthat the interpolation islocal;ifapoint on theguiding polygon ismoved, theeffects areintuitiveandlimitedtonearbypointsonthespline.Athirdadvan tage is directly related to its use in vision; a technique for matching a spline represented boundary curve against raw data is to search perpendicular to the splinetoredgeswhosedirectionisparalleltothesplinecurveandlocationperpen dicular to the splinecurve. Perpendicular and parallel directions are computable directlyfrornthe parametersrepresentingthespline. BSplineMathematics Theinterpolant through agivensetofpointsx,, /= 1,...,nisx(s),avector valued piecewise polynomial function of the parameter 5; schanges uniformly between data points.Forconvenience, assume that x(/) = x,, thatis:sassumes integervaluesatdatapoints,ands= 1,..., n.Eachpieceofx(s) isacubicpolyno

(a)

(b)

(c) '

Fig. 8.9 Thesplineofdegree nmust lieintheconvex hullformed byconsecu tivegroupsofn+ 1 points, (a) n= 1(linear), (b) n= 2(quadratic), (c)n=3 (cubic).
2 4 0 Ch. 8 Representation of TwoDimensional Geometric Structures

mial.Globally, x(s) has three orders ofcontinuity acrossdata points (i.e.,up to continuityofsecondderivative:curvature).Formally,x(s) isdefinedas
x(s) = Y,B,(S) (8.4)

/=o The v, are coefficientsrepresenting the curve x(s). They alsoturn out tobe the verticesoftheguidingpolygon.Theyareadualtothesetofpointsx ; ;eachcanbe derived from the other. The ndata points x determine nv's. There are actually n+2v's;theadditionaltwocoefficients aredetermined from boundaryconditions. Forexample,ifthecurvatureattheendpointsistobe0,

v ^ f ^ ,=
V= (V_! + Vw+1 ) r

(8.5)

Thusonlynofthen +2coefficients areselectable. The basisfunctions B/(s) arenonnegativeandhavealimitedsupport, thatis, each B,isnonzero onlyfor sbetween /2and /+2,asshown inFig.8.10.The limitedsupportmeansthatonagivenspan (/,/+ 1)thereareonlyfour basisfunc tionsthatarenonzero, namely:B,\(s), B,(s), Bi+\(s), and Z?/+2(s).Figure 8.11 showsthisconfiguration.Thus,tocalculatex(s0) forsomes0simplyfindinwhich span itresides,andthen useonlyfour termsinthesummation (8.4),sincethere areonlyfourbasisfunctions whicharenonzerothere. The basisfunctions B,(s) are,themselves, piecewisecubicpolynomialsand their definition depends on the relative size (in parameter space) of the spans undertheirsupport.Ifthespansareofuniform size (e.g.,unity),thenallthebasis functions havethesameform andaremerelytranslatesofeachother. Moreover, each ofthe basisfunctions, onitsnonzerosupport, ismadeoffour pieces.So,in Fig.8.11inthespan (/, /+1) appear:thefourthpieceofBt\ (s),thethirdpieceof Bj(s), thesecondpieceofBi+\(s), andthefirstpieceofBi+2(s). Callthesepieces C,o(s),...,Q3(5)respectively;thenxis) ontheinterval (/, /+1)isgivenby:
x(s) Cfi f3 (s)Y,_i + Cit2(shi + C/+i,iCj)v / + i + C / + 2,o(s)v / + 2

No matter what / is, C,j will have the same shape; this property allows a simplification incalculations.Define four primitivebasisfunctions,and interpolate alongthecurvebyparameter shifting:
C,j(s)= Cj(si) /= 0, ..., + l ; j = 0, 1,2,3 (8.6)

i 2

i 1

i+1

i+2

Fig. 8.10 UniformBspIine:B,(s). Its supportisnonzeroonlyfor5between / 2and / + 2. 241

Sec.8.2 BoundaryRepresentations

Fig. 8.11 The only four basis functions that are nonzero over the span (/, / + 1).Onlytheoverlappingpartsonthisspanareshown.

TofindxGo),ifsoisinthespan (/, i+1), usetheformula:


x(s) = ViiCsis O+V/Ciis i) + Vy+jCjCs/) + v ; + 2 C 0 ( s / ) (8.7)

wheretheQ(t) aregivenby:

Cob) = j
6 3/ 3 6t2 +4 c2U)= 6 3 t + 3t: 3t +1

c3U)

o Formalderivationsmaybefoundin [BarnhillandRiesenfeld 1974;deBoor1978]. UsefulFormulae Theformulaemaybesimplified stillfurther. x(s) iscalculatedinpieces(seg ments);definethesegmentsx,(f) where/rangesfrom0to1.Then x,(0) = x, and for / = 1,..., n 1 (8.8)

xi(D x

Inmatrixnotation,andexplicitlycalculatingthedefinition ofthecubicpolynomi als C,(f), x,(r) [r5, t2, tfl][C]lriri, where[C\ isthematrix: 1 3 3 1 3 0 3 6 1 0 3 0 6 3 1 4 1 0
242 Cb. 8 Representation of TwoDimensional Geometric Structures

yv ; + b vi+2]T

(8.9)

The /th column inthe matrix [C]inEq. (8.9) aboveisthecoefficients ofthe cubic polynomial Qit) 0 = 0, 1, 2, 3). There is a distinction between openand closedcurves. For open curves the boundary conditions must be used to solve for the two additional coefficients, as above.Forclosedcurves, simply Vo= v

and

l vi

(8.10)

The relation between the different v, and x, is summarized as follows. For open curveswithzerocurvatureatthe endpoints:

6 0 1 4 1

v [ Vl 1
.
1 4 1 0 6 vi v .

*o1
Xl

Xfl1

andfor closedcurves:

[ vl1
v

x 01
Xl

(8.11) 4 1
v

l
v

nl
X

Equation (8.10) givesthe relationship between the points on theguiding po lygonand thepointsonthespline.Itmaybederived from Eq. (8.9) with t=0 (see exercises). To interpolate between these points, use a value of tbetween the ex tremesof0and 1.Choosing t=kdtfor k = 0,..., nwhere ndt=\ and substituting intoEq. (8.9) yields x , ( * dt) = [(A:dt)Kk dt)Hk dt)\\ [C] [v,_ b v v / + 1 , rl+2Y (8.12)

This can be decomposed [Wu et al. 1977;Gordon 1969] into the following equa tion. Xj(kdt) = 0 0 1

[ T

6 2 6 1 1 0 0

M[Cl dt
1

ki]
/ V/+1 V/+2. V / l | V/ V/+1
V

dt 1

(8.13)

Thetangent atacurveisobtained by differentiation:

x'i(kdt) = 0 1

0]7" 1

3 3 ll 2 0 0]hdA 1 3 0 1 1 1 1 0 2dt 3 6 1 1 1 [ 0 0 1 1 3 3 0 0
k

(8.14)

/+2. 243

BoundaryRepresentations

8.2.7 StripTrees

Inmanycomputational problemstherearespacetimetradeoffs. Anonredundant explicit representation for ageneral discretecurve,such asachaincode,isterse butmaybedifficult touseforcertaincomputations.Ontheotherhand,arepresen tationfor curvesmaytakeupmuchspacebutallowoperationsonthosecurvesbe veryefficient. Arepresentationwiththelatterpropertyisstriptrees[Ballard1981]. Strip trees areclosed under intersection and union operations, and these opera tionsmaybeefficiently implemented. Astriptreeisabinarytree.Thedatumateachnodeisaeighttuple,ofwhich sixentriesdefineastrip (rectangle) andtwodenoteaddressesofthesons (ifany). Thuseachstripisdefined byasixtupleS(xb, xe>w)asshowninFig.8.12. (Only fiveparametersarenecessary todefine an arbitrary rectangle, but the redundant representationprovesusefulinunionandintersectionalgorithmstofollow.) Thetreecanbecreatedfromanycurvebythefollowingrecursiveprocedure, whichisverysimilartoAlgorithm8.1. Algorithm8.3: MakingaStripTree Findthesmallestrectanglewithasideparalleltothelinesegment [x0,x)thatjust coversallthepoints.Thisrectangleisthedatumfortherootnodeofatree.Picka pointxkthat touchesoneofthesidesoftherectangle. Repeat theabove process for the twosublists [x0, ..., x^.) and [x*, ..., x).These become sonsof the root node. Repeattheprocessuntiltheapproximationisaccurateenough. The halfopen interval facilitates the computations tofollow. Inthe example abovethepointxkexplicitlyappearsinbothsubtreesbutimplementationally need notbepartoftheleftone. Figure8.13showsthestriptreeconstructionprocess. IntersectingTwoCurvesviaStripTrees Consider what happens when astrip from one tree intersects astrip from another,asshowninFig.8.14.Ifthestripsdonotintersect, theunderlyingcurves

*e

w,

\ r 244

Fig. 8.12 Stripdefinition. Ch. 8 Representation ofTwoDimensional Geometric Structures

* y
(5.12)

FORMAT:

x yb xe ye w, wr
3 7 20 7 5 3

(20.7)

7 9 12 0

Z0Z

12 20

0 4.

n
iZ0

9 12 15 I 4

UZ0

15

20

0 0.

Fig. 8.13 Striptreeconstructionprocess.

donot intersect.Ifthestripsdointersect, theunderlyingcurvesmayormaynot. Todeterminewhich,thecomputationmaybeappliedrecursively.Attheleaflevel ofthetreedefinedastheprimitivelevel,theproblemcanalwaysberesolved. Algorithm8.4: IntersectingTwoStripTreesRepresentingCurves BooleanProcedureTreelnt (71,7*2,L) Begin caseintersectiontypeoftwostrips7TandT2of begincase [primitive]return(true) [null]return(false) [possible]I/T2 isthe"fatter" strip return(Treelnt(71,LSon(72) or Treelnt(Tl,RSon(T2)) Elsereturn (Treelnt(LSon(71),72)orTreelnt(RSon(71),77)); endcase; end;

& *P
NULL POSSIBLE

b.

Fig. 8.14 Typesofstripintersections. (a)Twokindsofintersections:NULLon theleft;variousPOSSIBLEintersections ontheright, (b)Undercertain conditionstheunderlyingcurvesmust intersect.


Sec. 8.2 Boundary Representations

245

The "Union"ofTwoStripTrees The " u n i o n " oftwo strip trees may be defined asastrip that covers bothof the two root strips. The two curves defined by [x'0, ...,x'), [x" 0 , ...,x" m )are treated as two concatenated lists.That is, the resultant ordering issuch that x0= x' 0 , x m + w + i = \"m. Thisconstructionisshown inFig.8.15. ClosedCurvesRepresented byStripTrees A region may berepresented byits(closed) boundary. The striptree con struction method describedinAlgorithm 8.3worksfor closedcurves and, inciden tally, alsoforselfintersecting curves. Furthermore, ifaregion isnot simply con nected (has "holes") itcan still berepresented asastrip tree whichatsome level hasconnected primitives. Many useful operations on regions can becarried out with striptrees. Exam ples are intersection between acurve and aregion and intersecting two regions. Another example isthedetermination of whether a point is inside a region. Roughly, ifany semiinfinite lineterminating atthe point intersects the boundary ofthe region anoddnumber oftimes, the point isinside.The implied algorithmis computationallysimplified for striptreesinthefollowing manner: PointMembershipProperty.Todecidewhetherapointzisamemberofaregion represented byastrip tree, compute the number ofnondegenerate intersec tions ofthe strip tree with any semiinfinite strip Lwhich has||w\\= 0and emanatesfromz.Ifthisnumberisodd,thepointisinsidetheregion. This isbecause forclear intersections the underlying curves may intersect more than once but must intersect an odd number oftimes.Apotential difficulty exists when the strip Listangent tothe curve.To overcome this difficulty inpractice,a different Lmaybe used. IntersectingaCurvewitha Region The strategy behind intersectingastrip tree representingacurvewithastrip tree representing aregion istocreateanew tree for the portion ofthe curve that overlaps the region. This can bedone bytrimming theoriginal curve strip tree. Trimming isdone efficiently bytaking advantage ofanobvious property ofthe in tersection process: PruningProperty:ConsidertwostripsSc from TcandSafrom Ta.Iftheinter sectionofScwith Ta isnull,then (a)ifanypointonSc isinside Ta, theentire treewhoserootstripisScisinsideoron Ta, and (b)ifanypointonScisout sideofTathentheentiretreewhoserootstripisScisoutsideTa.

^"^^ / \ / \ s

Fig.8.15 Constructionfor"union"of striptreesrepresentingtwocurves.

This leads tothe Algorithm 8.5 forcurveregion intersection using trees.If the curve stripis"fatter" (i.e.,has more area), copy the node and resolve the in
ch.8 Representation ofTwoDimensionalGeometricStructures

tersectionatlowerlevels.Intheconversecaseprunethetreesequentially byfirst intersectingtheresultantprunedtreewiththerightregionstrip. Algorithm8.5: CurveRegion Intersection commentA.ReferenceProcedurereturnsapointer; referenceprocedureCurveRegionInt(71,72) begin A := 72; commentRisaglobalusedbyCRInt; return(CRInt(71,72)); end; referenceprocedureCRInt(Tl,T2) begin beginCos?Striplnt(71,72) of [NullorPrimitive] //intersection (71,/?,TRUE) = nullthen //Inside(71,/?) thenreturn (71) elsereturn(null); elsereturn(71); [Possible]//71is"fatter"then begin NT:= NewRecord; x d (NT):x 4 (T); x e ( N T ) : = x , (T); w,(NT):= W/ (T); w r (NT):= MVCT); LSon(NT) := CRInt (ISon(71),72); i?Son(NT) := CRInt (/?Son(71),72); returniNT); end elsecomment 72is "fatter" /?e/m(CRInt(CRInt(71,LSon(72)),/?Son(72))); end; endCase; end; Theproblemofintersectingtworegionscanbedecomposedintotwocurveregion intersectionproblems (Fig.8.16).Thusalgorithm 8.5canalsobeusedtosolvethe regionregionintersectionproblem.
8.3 REGION REPRESENTATIONS 8.3.1 SpatialOccupancy Array

The most obvious and quite auseful representation for aregion on araster isa membershippredicatep(x,y) which takesthe value 1 whenpoint 6c,y) isinthe
Sec.8.3 Region Representations

247

Fig. 8.16 Decomposition of RegionRegion Intersection, (a) Desired result. (b) Portion ofboundarygenerated bytreating threelobed region asacurve, (c) Portionofboundarygenerated bytreating iveIobed egionasacurve,(d)Result f r ofunionoperation.

region and the value0otherwise.Oneeasywayto implement such afunction is with amembershiparray,an array of l's and O'swith the obvious interpretation. Sucharraysarequickyinterrogated andalsoquiteeasilyunioned, mergedandin tersectedbyANDandORoperations,appliedelementwiseontheoperandarrays. Thedisadvantagesofthisrepresentation arethatitrequires muchspaceanddoes notrepresenttheboundaryinausefulway.
8.3.2 yAxis

Arepresentation thatismorecompactandwhichoffers reasonablealgorithms for intersection,merging,andunionistheyaxisrepresentation [Merrill1973].Thisis arunlengthencodingofthemembershiparray,andassuchitprovidesnoexplicit boundaryinformation.Itisalistoflists.Eachelementonthemainlistcorresponds toarowofconstantyintheimageraster.Eachrowofconstantyisencodedasalist ofxcoordinatepoints;thefirstxpointatwhichtheregionisenteredwhilemoving alongthatyrow,thenthexpointatwhichtheregionisexited,thenthexpointat whichitnextisentered,andsoforth.Theyrowswithnoregionpointsareomitted from themainlist.Thus, inanotation wheresuccessivelevelsofsublistaresur rounded by successive levels of parentheses, the yaxis encoding of aregion is shown in Fig. 8.17; here the first element of each sublist is the y coordinate, followed byalistof"into"and"outof" xcoordinates.Whereaycoordinatecon

((245)(435)(53355))
2 4 8

Fig. 8.17 .yaxisregion representation.

Ch. 8 Representation of TwoDimensional Geometric Structures

tainsanisolatedpointintheregion,thispointisrepeatedinthexaxisrepresenta tion, asshown bytheexample inFig.8.17.Thus "lines" (regionsofunitwidth) canbeeasily(althoughnotefficiently) representedinthissystem. Unionandintersectionareimplemented onj>axisrepresentationsasmerge likeoperationswhichtaketimelinearlyproportionaltothenumberofyrows.Two instancesof^axisrepresentationsandtherepresentationoftheirunionareshown inFig.8.18.Notethattheunionamountstoamergeofxelementsalongrowsor ganizedwithinamergeofrowsthemselves. The.yaxisrepresentation iswasteful ofspaceiftheregionbeingrepresented islong,thin,andparalleltotheyaxis.Inthiscaseoneisinvitedtoencodeitinx axis format, inan obvious extension. Working with mixed xaxisandjaxis for matspresentsnoconceptualdifficulties, butconsiderablelossofconvenience. 8.3.3 QuadTrees Quadtrees [Samet 1980]areauseful encodingofthespatialoccupancyarray.The easiest wayto understand quad trees isto consider pyramids asan intermediate representationofthebinaryarray.Figure8.19showsapyramid (Section3.7) made from thebaseimage (ontheleft).Eachpixelinimagesabovethelowestlevelhas oneofthreevalues,BLACK,WHITE,orGRAY.Apixelinalevelabovethebase is BLACK or WHITE if all its corresponding pixels in the next lower level are BLACKorWHITErespectively. IfsomeofthelowerlevelpixelsareBLACKand othersareWHITE,thecorrespondingpixelinthehigherlevelisGRAY. Suchapyramidiseasytoconstruct.Toconvert thepyramid toaquadtree, simplysearchthepyramidrecursivelyfrom thetoptothebase.Ifanarrayelement in the pyramid is either BLACK or WHITE, form a terminal node of the correspondingtype. Otherwise,formaGRAYnodewithpointerstotheresultsof

((1 2367) (227) (31 1 33) (51 2))

((134) (2 1 5) (32257) (422))

AUB

(12467) (21 7) (3 1 357) (422)<51 2))

Fig. 8.18 TwopointsetsA, B,and A U B,withtheir/axis representations. Sec. 8.3 Region Representations 249

1 3 5 7 0

2 4 6 B 8 9 10 E H A C

jj 12
I

g
9

hi
g Level0

,
g Level1

Level2

'

Bk a C
J White

g | Gray Level3

Fig. 8.19 Pyramid usedinquad treeconstruction. Letterscorrespond topixels inthepyramidthatareeitherBLACKorWHITE.

therecursiveexamination ofthefour elementsatthenextlevelinthetree(Algo rithm 8.6).

Algorithm8.6: QuadTreeGeneration ReferenceProcedureQuadTree {integerarraypyramid;integerx,y,level); Comment'N'W,NE,SW,SEarefieldsdenotingthesonsofaquadtreenode; Newnode(/>); TYPECP):= PyramiddND (xj/,Level)); //TYPE(P) BLACKorWHITEthenreturn(P) elsebegin SWCP):=QuadTree(Pyramid, 2*x, 2*y,Level + 1); SE(P):=QuadTree(Pyramid, 2*x + 2*Level, 2*y, Level + 1); NW(P):=QuadTree(Pyramid, 2*x, 2*y+ 2*Level,Level + 1); NE(P):=QuadTree(Pyramid, 2*(x + Level),2*(y ILevel),Level + 1); return(P)
end;
2 5 0 Ch. 8 Representation of TwoDimensional Geometric Structures

Hereanimplementationalpointisthattheentirepyramidfitsintoalineararrayof size2(2 2xlevel ).IND isan indexingfunction whichextracts theappropriate value giventhex> vandlevelcoordinates.Thereadercanapplythisalgorithmtotheex ampleinFig.8.19toverifythatitcreatesthetreeinFig.8.20. Thequadtreecanbecreateddirectlyfromthebaseofthepyramid,buttheal gorithm ismore involved. Thisisbecauseproceedingupwardfrom the base,one mustsometimesdefer thecreationofblackandwhitenodes.Thisalgorithmisleft fortheexercises [Samet1980]. Manyoperationsonquadtreesaresimpleandelegant.Forexample,consider thecalculationofarea[Schneier1979]:

Algorithm8.7: AreaofaQuadTree IntegerProcedureArea(referenceQuadTree;integerheight) Begin CommentNW,NE,SW,SEare ields enotingthesons of f d aquadtreenode; BlackArea:= 0; //TYPE(QuadTree) = GRAYthen forIintheset{NW,NE,SW,SE}do BlackArea = BlackArea 4Area(I(QuadTree),height1) elseifTYVE (QuadTree) = BLACKthen BlackArea = BlackArea + 22*height; return(BlackArea) end; OtherexamplesmaybefoundintheReferencesandarepursuedintheExercises. Thequadtreeandtheassociatedpyramidhavetworelateddisadvantagesasa representation.Thefirstisthattheresolutioncannot beextended tofinerresolu tionafter agridsizehasbeenchosen.Thesecondisthatoperationsbetweenquad

Black Gray

D White

Fig. 8.20' QuadtreefortheexampleinFig.8.19.


Region Representations 2 5 1

treestacitly assume that their pyramidsaredefined on thesamegrids.The grids cannotbeshiftedorscaledwithoutcumbersomeconversionroutines.


8.3.4 MedialAxisTransform

Iftheregion ismadeofthin components, itcanbewelldescribed for many pur poses by a "stickfigure" skeleton. Skeletons may be derived by thinning algo rithmsthatpreserveconnectivityofregions;themedialaxistransform (MAT),of [Blum1973;Marr1977]isawellknownthinningalgorithm. TheskeletonisdefinedintermsofthedistanceofapointxtoasetA: ds(x, A) =inf{</(x, z)|z inA} (8.15) Popular metrics are the Euclidean, city block, and chessboard metrics describedinChapter2. Let Bbe the set of boundary points. For each point Pin aregion, find its closestneighbors (bysomemetric) ontheregionboundary.Ifmorethanoneboun darypointistheminimumdistancefrom x,thenxisontheskeletonoftheregion. Theskeletonisthesetofpairs{x,ds(x, B)} whereds(x, B) isthedistancefromx totheboundary,asdefined above (thisisadefinition, notanefficient algorithm.) Sinceeach xintheskeleton retainstheinformation ontheminimum distanceto theboundary,theoriginalregionmayberecovered (conceptually) astheunionof "disks"(inthepropermetric)centeredontheskeletonpoints. Somecommon shapes have simply structured medial axistransform skele tons.IntheEuclideanmetric,acirclehasaskeletonconsistingofitscentralpoint. Aconvex polygon hasaskeleton consisting oflinearsegments; ifthe polygon is nonconvex, thesegmentsmaybeparabolicorlinear.Asimplyconnectedpolygon hasaskeleton thatisatree(agraphwithnocycles).Someexamplesofmedialaxis transform skeletonsappearinFig.8.21. The figure shows that the skeleton is sensitive to noise in the boundary. Reducingthissensitivity maybeaccomplishedbysmoothing theboundary, using apolygonalboundaryapproximation, orincludingonlythosepointsintheskele tonthataregreater thansomedistancefrom theboundary.Thelatterschemecan leadtodisconnectedskeletons.

Algorithm8.8: MedialAxisTransformation[RosenfeldandKak1976] Letregionpointshavevalue 1 andexteriorpointsvalue0. Thesepointsdefinean image/(x). Let/*(x) begivenby fk(x) fix) + min ifkHx)],
rf(x,z)<l

k>0

The pointsfk(x) willconvergewhen kisequaltothemaximum thicknessofthe region.Wherefk{x) hasconverged,theskeletonisdefinedasallpointsxsuchthat fk(x) > fk{z), d(x, z) < 1.

252

Ch. 8 Representation of TwoDimensional Geometric Structures

(a)

(b)

Fig. 8.21 Medial Axis Transform skeletons (a), and the technique applied to humancellnuclei (b).Shownin (b)areboth the"normal"skeletonobtainedby measuringdistancesinteriortotheboundaries,andtheexoskeleton,obtainedby measuringdistancesexteriortotheboundary.

Thisalgorithmcanproducedisconnectedskeletonsforexcursionsorlobesoff the main body of the region. Elegant thinning algorithms to compute skeletons are givenin[Pavlidis1977].
8.3.5 DecomposingComplex Regions

Muchworkhasbeendoneon thedecompositionofpointsets (usually polygons) intoaunionofconvex polygons.Suchconvex decompositions providestructural analysisofacomplexregionthat maybeuseful formatchingdifferent pointsets.
Sec. 8.3 Region Representations 253

Anexampleofthedesiredresultintwodimensionsispresentedhere,andthein terestedreadermayreferto[Pavlidis1977]forthedetails.Suchadecompositionis notuniqueingeneralandinthreedimensions,suchdifficulties arisethattheprob lemisoften calledillformed orintractable [VoelckerandRequicha1977]. TheshapesofFig.8.22 havethree "primary convex subsets" labeled X, Y, and Z.Theyform different numbersof"nuclei" (roughly,intersection sets).The shapeisdescribed byagraphthathasnodesfornucleiandprimaryconvexsubsets and an arc between intersecting sets (Fig. 8.22c). Without nodes for the nuclei (i.e., ifonlyprimary convexsubsets and their intersections arerepresented), re gionswithdifferent topological connectedness can produce identicalgraphs (Fig. 8.22b).

8.4 SIMPLE SHAPE PROPERTIES

8.4.1 Area

Theareaofaregionisabasicdescriptiveproperty.Itiseasilycomputedfrom curve boundary representations (8.3.1) andthusalsofor chaincodes (8.3.2);theircon

254

Ch.8 Representation of TwoDimensional Geometric Structures

tinuous analog isalso useful. Consider acurve parameterized on arclength 5so thatpoints(x,y) aregivenbyfunctions (xis), y(s))
p

area= f (x & _ y *) ds
J 0

(8.16)

ds

ds

wherePistheperimeter.
8.4.2 Eccentricity

Thereareseveralmeasuresofeccentricity,or"elongation".Oneofthem isthera tioofthe length ofmaximum chord A tomaximum chord Bperpendicular toA (Fig.8.23). Another reasonable measureistheratiooftheprincipalaxesofinertia;this measurecanbebasedon boundary pointsortheentireregion [Brown 1979].An (approximate) formuladuetoTenenbaumforanarbitrarysetofpointsstartswith themeanvector

x 0 = x
n

(8.17)

xin R

To compute the remaining parameters, first compute the (/th moments My definedby Mi}= Z (xoxViyoyV
xin R

(8.18)

Theorientation,9,isgivenby = T t a n _ 1 ( 1 7 ^ 1 7 ) + " ( T } 2 Af2o M02 2 andtheapproximateeccentricityeis


9 ( 8 1 9 )

. = <"**>'+* *
area
8.4.3 EulerNumber

(8 . 20)

The Euler number is a topological property defining the set of objects that are equivalent under "rubbersheet" deformations oftheplane.Itdescribesthecon nectednessofaregion,notitsshape.Aconnectedregionisoneinwhichallpairsof

Fig. 8.23 Aneccentricitymeasure: AIB.


Sec.8.4 Simple Shape Properties 255

pointsmaybeconnected byacurvelyingentirelyintheregion.Ifacomplextwo dimensional objectisconsidered tobeasetofconnectedregions,whereeachone canhaveholes,theEulernumberfoxsuchanobjectisdennedas (numberofconnectedregions) (numberofholes) Thenumberofholesisonelessthantheconnectedregionsinthesetcomplement oftheobject. 8.4.4 Compactness Onemeasure ofcompactness (notcompactnessinthesenseofpointset topology) is the ratio (perimeter2)/area, which isdimensionless and minimized bya disk. Thismeasure iscomputedeasilyfrom thechaincoderepresentation ofthe boun dary where the length of an individual segment of eightneighbor chain code is given by (V2) ifthe (eightneighbor) direction isodd and by 1ifthedirection is even.Thearea iscomputed byamodification ofAlgorithm 8.2andthe perimeter maybeaccumulatedatthesametime. For small discrete objects, this measure may not be satisfactory; another measure isbased onamodel oftheboundary asathin springywire [Youngetal. 1974].Thenormalized"bendingenergy"ofthewireisgivenby
E

1 r = \ ) \x(s)\2ds

(8.21)

where Kis curvature. This measure isminimized byacircle. Ecan be computed from thechaincoderepresentation byrecognizingthat K = dQ/dS, andalso from theFouriercoefficients mentionedbelowsince d2x + d2y\ ds2\ ds2\ sothatE,usingParseval'stheorem,is
k=
2

(kw0)H\Xk\2 +\Yk\2)

(8.23)

whereXk = {Xk, Yk)aretheFourierdescriptorcoefficients in(8.2).


8.4.5 SlopeDensity Function

Theijjscurvecanbethebasisfor theslopedensityfunction(SDF) [Nahin1974]. TheSDFisthehistogram orfrequency distribution oft// collected overtheboun dary.AnexampleisshowninFig.8.24.TheSDFisflatforacircle(orinacontinu ousuniverse, anyshapewithamonotonically varyingi//);straight sidesstand out sharply,asdosharpcorners,whichinacontinuousuniverseleavegapsinthehis togram.TheSDFisthesignatureofthei//5curvealongthei>axis. /
2 5 6 Ch. 8 Representation of TwoDimensional Geometric Structures

(a)

2TT

(b)

2TT

(c) Fig. 8.24 TheSlopeDensity Function for threecurves:atriangular blob,acir cle,andasquare.

8.4.6 Signatures Bydefinition, aprojection isnot an informationpreserving transformation. But Section 2.3.4showed that (aswithFourier descriptors,) enough projectionsallow reconstruction oftheregiontoanydesired degreeofaccuracy. (This observation formsthebasisforcomputerassistedtomography.) Givenabinaryimage/ ( x ) =0or1,definethehorizontalsignaturepix) as

p(x)ff(x,y)
y

(824)

pix) issimplytheprojection ofponto thexaxis. Similarly,definepiy), theverti calsignature,as

piy) =ffix,y)

(825)

Maxima and minima ofsignatures are often useful forestablishing preliminary


Sec. 8.4 SimpleShapeProperties 257

landmarks in an image to reduce subsequent search effort [Kruger et al. 1972] (Fig.8.25). Iftheregionisnotbinary,butconsistsofadensityfunction, Eq. (8.24) may still be used. Polar projections may be useful characterizations ifthe point of projection ischosen carefully. Another idea is to provide a number of projections, qu ..., qn, the /th one based on the /th sublist in each row in a^axislike region representation. This technique ismore sensitive to nonconvexities and holes than isa regular projec tion (Fig.8.26). 8.4.7 Concavity Tree Concavity trees [Sklansky 1972] represent information necessary tofill inlocal in dentations of the boundary as far as the convex hulland to study the shape of the resultant concavities. A region S is convex ifffor any xj and X2in S, the straight line segment con necting xi and x 2isalso contained inS.The convex hullof anobject Sisthe small est//such that S C H a n d / / i s convex. Figure 8.27 shows aregion, the steps in the derivation of the concavity tree, and theconcavity treeitself. 8.4.8 Shape Numbers For closed curves and a 3bit chain code (together with a controlled digitization scheme), many chaincoded boundaries can begiven aunique shape number [Bri

Heart Analysis: Papillary Muscles

Signature

Fig. 8.25 The useofsignatures to locatealeft ventriclecrosssection in ultrasound data. (Outer curves are smoothed versionsofinner signatures.)
258 Ch. 8 Representation of TwoDimensional Geometric Structures

(a)

(b)

(0

Fig. 8.26 Ashape (a)and projections;from thefirst (b)andsecond (c)sublists oftheyaxisrepresentation.

biesca and G u z m a n 1979]. T h e shape n u m b e r is related to the resolution of the

digitization scheme. In amultiple resolution pyramid ofdigitization grids, every possible shapecan be represented asapath through atree. Ateach grid resolu tion corresponding to a level in the tree, there are afinite number of possible shapes. Moving up the tree, the coarser grids tend to blur distinctions between different shapesuntilatsomeresolutiontheyareidentical. Thislevelcanbeused asasimilarity measure between shapes. The basic ideabehind shapenumbersis the following. Consider allthe possibleclosed boundarieswith nchainsegments. Theseform the possibleshapesof"order A?."Thechainencoding for aparticular boundary can be madeunique byinterpreting thechaincode direction sequence asanumberand picking thestart pointthat minimizes thisnumber. Noticethat the orders ofshapenumbers must beeven on rectangular grids sinceacurveof oddordercannotclose. Algorithm8.9generatesashapenumberofordern.

Object,o o,

On

/ \ I
0 1 2 0 3 1

/ K o 2 o 3

Fig. 8.27 Concavitiesofanobjectand theconcavitytree.


Sec.8.4 Simple Shape Properties

259

Algorithm8.9: MakingaShapeNumberofOrdern 1. Choosethemaximaldiameteroftheshapeasoneofthecoordinateaxes. 2. Find the smallest rectangle thathasasideparalleltothisaxisandjustcovers theshape. 3. From the possible rectangles oforder n,findthe one that best approximates therectangleinstep2.Scalethisrectanglesothatthelengthofthelongestside equalsthatofthemajoraxis,andcenteritovertheshape. 4. Setallthepixelsfallingmorethan50%insidetheregionto1,andtherestto0. 5. Find thederivative ofthe chain encoded boundary ofthe region of l's from step4. 6. Normalize thisnumber byrotating the digitsuntil the number isminimum. Thenormalizednumberistheshapenumber. Figure8.28showsthesesteps.
Order=26

(1)

(2)and (3)

=0

Chaincode: Derivative:

0 1 0 3 0 3 0 0 1 0 0 0 3 2 3 2 3 2 2 2 2 2 1 2 1 1 2 0 0 2 0 2 1 2 0 1 1 0 0 2 0 2 0 1 1 1 1 0 2 0 1 0 (5)

00202011110201020020212011 ( ) 6 Fig. 8.28 Stepsindeterminingashapenumber (seetext). 260


Ch. 8 Representation of TwoDimensional Geometric Structures

Generating ashape number ofaspecific order may be tricky, asthere isa chance that theresulting shape number may begreater than order nduetodeep concavities in the boundary. In this case, the generation procedure can be re peatedforsmallervaluesofnuntilashapenumberofndigitsisfound. Eventhis strategy may sometimes fail. The shape number may not exist in special cases such asboundarieswithnarrow indentations.Thesefeatures maycausestep4in Algorithm 8.11 tofail in the following way.Even though the rectangle ofstep3 wasoforder n,theresultant boundary mayhaveadifferent order. Nevertheless, forthevastmajority ofcases,ashapenumbercanbecomputed. The degree ofsimilarity for two shapes isthe largest order for which their shape numbers are the same.The "distance" between twoshapesisthe inverse oftheirdegreeofsimilarity. Thisdistanceisanultradistanceratherthananorm: d(S, S) = 0 d(Si, S2) > 0 for 5! ^ S2 d(Sh S3) < maxG/(Slf S2), d(S2, S3)) (8.26)

Figure8.29 showsthe similaritytreeforsixshapesascomputed from their shape numbers.Whentheshapenumberiswelldefined, itisauseful measuresinceitis unique (for eachorder),itisinvariant under rotation andscalechangesofanob ject,anditprovidesametricbywhichshapescanbecompared.
O ABCDEF

14

A A B
O O

B 6
C O

c
6 8
OO

D 6

E 6

F 6

8 10 8 8
O O

c
D E F

8 12 8
O O

8 8
O O

Fig. 8.29 Sixshapes, theirsimilarity trees,and the ultradistances between the shapes. Sec. 8.4 SimpleShapeProperties 261

EXERCISES 8.1 Consideraregionsegmentationwhereregionsareoftwo types:(1) illednand(2) f i withholes.Relatethenumberofjunctions, boundaries,andfilledinregionstothe Eulernumber. 8.2 Writeaprocedurefor inding heretwo chaincodesintersect. f w 8.3 Devisealgorithmstointersectanduniontworegionsinthe^axisrepresentation. 8.4 Showthatthenumberofintersectionsofthecurvesunderaclearstrip intersection isodd. 8.5 ModifyAlgorithm8.4toworkwithstriptreeswithvaryingnumbersof sons. 8.6 DeriveEq. (8.9)fromEq. (8.7). 8.7 ShowthatEqs. (8.12)and(8.13)areequivalent. 8.8 GiventwopointsXjandx2andslopes(j>(x\) and(/>(x2), indheellipsewithmajor f t axisathatfitsthepoints. 8.9 Writeaproceduretointersecttworegionsrepresentedbyquadtrees,producingthe quadtreeofthe intersection. 8.10 Determinetheshapenumbersfor (a)acircleand (b)anoctagon.Whatisthedis tancebetweenthem? REFERENCES
AMBLER, A. P., H. G. BARROW, C. M. BROWN, R. M. BURSTALL, and R. J. POPPLESTONE. " A versatile system for computer controlled assembly." ArtificialIntelligence6,2, 1975, 129156. BALLARD, D. H. "Strip trees:A hierarchical representation for curves." Comm. ACM 24,5, May 1981, 310321. BARNHILL, R. E. and R. F. RIESENFELD. ComputerAided GeometricDesign.New York: Academic Press, 1974, 160. BARROW, H.G. and R.J. POPPLESTONE. "Relational descriptions inpicture processing." InMI6, 1971. BLUM, H."Biological shape and visualscience (Part I ) . "J. TheoreticalBiology38, 1973,205287. BRIBIESCA, E. and A. GUZMAN. " H O W todescribe pure form and how to measure differences in shapes usingshape numbers." Proc, PRIP, August 1979,427436. BRICE, C. R. and C. L. FENNEMA. "Scene analysis using regions." ArtificialIntelligence I, 3,Fall 1970, 205226. BROWN, C. M. "TWO descriptions and a twosample test for 3d vector data." TR49, Computer Sci enceDept., Univ. Rochester, February 1979. DEBOOR, C.APracticalGuidetoSplines.New York:SpringerVerlag, 1978. DUDA, R.O.and P.E. HART. PatternRecognitionandSceneAnalysis. New York:Wiley, 1973. FREEMAN, H. "Computer processing of line drawing images." Computer Surveys 6, 1, March 1974, 5798. GALLUS, G. and P. W. NEURATH. "Improved computer chromosome analysis incorporating prepro cessingand boundaryanalysis." PhysicsinMedicineandBiology15,1970,435. GORDON, W. J. "Splineblended surface interpolation through curve networks." J. Mathematics and Mechanics 18, 10, 1969,931952. HOROWITZ, S. L.and T. P. PAVLIDIS. "Picture segmentation byatree traversal algorithm." J. ACM 23, 2, April 1976,368388.

Ch. 8 Representation of TwoDimensional

Geometric

Structures

KRUGER, R. P., J. R. TOWNE, D. L. HALL, S.J. DWYER, and G. S. LUDWICK, "Automatic radiographic

diagnosis viafeature extraction and classification ofcardiac size and shape descriptors." IEEE Trans.BiomedicalEngineering 79,3,May 1972. MARR, D. "Representing visual information." AI Memo 415,AI Lab, MIT, May 1977. MERRILL, R.D. "Representations ofcontours andregions forefficient computer search." Comm. ACM16, 2, February 1973, 6982 NAHIN, P.J. "The theory andmeasurement ofasilhouette descriptor forimage preprocessing and recognition." PatternRecognition 6,2,October 1974. PATON, K. A."Conic sections inautomatic chromosome analysis."InMI5, 1970. PAVLIDIS, T. StructuralPatternRecognition.New York:SpringerVerlag, 1977. PERSOON, E.and K.S.Fu. "Shape discrimination using Fourier descriptors." Proc, 2nd IJCPR, Au gust 1974,126130. REQUICHA, A.A.G."Mathematical models ofrigid solid objects." TM28, Production Automation Project, Univ. Rochester, November 1977. ROBERTS, L.G."Machine perception ofthreedimensional solids." InOpticaland ElectroopticalInfor mationProcessing,J.P. Tippettetal. (Eds.).Cambridge, MA:MIT Press, 1965. ROSENFELD, A.and A.C. KAK. DigitalPictureProcessing. New York:Academic Press, 1976. SAMET, H. "Region representation: quadtrees from boundary codes." Comm. ACM23, 3, March 1980,163170. SCHNEIER, M. "Linear time calculations ofgeometric properties usingquadtrees."TR770, Computer ScienceCenter, Univ.Maryland, May 1979. SHIRAI, Y."Analyzing intensity arrays using knowledge about scenes."InPCV, 1975. SKLANSKY, J. "Measuring concavity ona rectangular mosaic." IEEE Trans. Computers 21, 12, De cember 1972. SKLANSKY, J.and D.P.KIBLER. " Atheory ofnonuniformly digitizing binary pictures." IEEE Trans. SMC 6,9,September 1976,637647. TOMEK, I."Two algorithms forpiecewise linear continuous approximation offunctions ofone vari able." IEEE Trans.Computers23,4, April 1974,445448. TURNER, K.J."Computer perception ofcurved objects usingatelevision camera." Ph.D. dissertation, Univ.Edinburgh, 1974. VOELCKER, H. B.and A. A.G.REQUICHA. "Geometric modelling ofmechanical partsand processes." Computer 10,December 1977,4857. Wu, S.,J.F.ABEL, andD.P.GREENBERG. " A ninteractive computer graphics approach to surface representations." Comm. ACM20, 10,October 1977, 703711. YOUNG, I.T.,J.E. WALKER, and J.E.BOWIE. " A n analysis technique forbiologicalshape I." Informa tionandControl25, 1974.

References

263

Representationsof ThreeDimensionalStructures

9.1 SOLIDSANDTHEIR REPRESENTATION

Weconsiderthreegeneralclassesofrepresentationsforrigidsolids 1. Surfaceorboundary 2. Sweep(ingeneral,generalizedcylinders) 3. Volumetric (ingeneral,constructivesolidgeometry) The semantics of solid representations is intuitively clear but sometimes mathematically tricky. The representations have different computational proper ties,andreadersshouldkeepthisinmindwhenassessingarepresentationforpos sibleuse.Asasimpleexample,asurfacerepresentationcandescribehowanobject looks; avolumetric version, which expresses the solid asacombination ofsub parts,maynotexplicitlycontaininformation aboutthesurfaceoftheobject.How ever,thesolidrepresentation maybebetterformatching,ifitcanbestructured to reflectfunctional subparts. Certainly we believe, asdo others, that modelbased vision will ultimately havetoconfront theissuesofgeometricmodelinginthreedimensions [Nishihara 1979].Ultimately,nonrigid aswellasrigidsolidswillhavetoberepresented.The characterizationofnonrigidsolidspresentsverychallengingproblems. Nonrigid solidsareoften auseful waytomodeltimevarying aspectsofob jects.Here,again, the kind ofmodel that isbestdependsheavily onthedomain. Forexample, auseful mammal model maybeonewithapiecewise rigid linkage (fortheskeleton) andsomeelasticcovering(fortheflesh).Computervisioninthe domainofmammals,eitherstaticinvariouspositionsoractuallymoving,mightbe basedongeneralizedcylinders (Section9.3).However,anothernonrigiddomainis that of heart chambers, that change through time as the heart beats. Here the skeletonisamuchlessintuitivenotion,soadifferent modelofnonrigiditymayap ply.Inmostcases,nonrigid objectsaremodeled asparameterized rigidobjects.In
264

the exampleofthehumanfigure,the parameters maybejointanglesfor linkages representingtheskeleton. The last part of this chapter deals with understanding line drawings, an influential andwellpublicized subfield ofcomputervision.Thisseeminglysimple andaccessibledomainavoidsmanyoftheproblemsinvolvingearlyprocessingand segmentation, yetit isimportant because ithasfurnished several importantalgo rithmic andgeometric insights. An important breakthrough inthisdomainwasa movefrom "image understanding" intwodimensionstotoanapproachbasedon thethreedimensionalworldandlawsgoverningthreedimensionalsolids.

9.2 SURFACEREPRESENTATIONS

The enclosing surface,or boundary,of a wellbehaved threedimensional object should unambiguouslyspecify theobject [Requicha 1980].Sincesurfacesarewhat is seen, these representations are important for computer vision. Section 9.2.1 considersmainlyplanarpolyhedralsurfacerepresentations.Morecomplex"sculp tured surfaces" [Forrest 1972; Barnhill and Riesenfeld 1974; Barnhill 1977] are treated in Section 9.2.2. Some useful surfaces are defined asfunctions of three dimensional directionsfrom acentralpointoforigin.Twooftheseare mentioned inSection9.2.3.
9.2.1 SurfaceswithFaces

Figure9.1showsthesolidrepresentationschememostfamiliar tocomputerscien tists.Solidsarerepresented bytheir boundaries, orenclosingsurfaces, whichare represented in terms ofsuch primitive entities asunbounded mathematical sur faces,curves,andpointswhichtogethermaybeusedtodefine "faces." Ingeneral,aboundaryismadeupofanumberoffaces;facesarerepresented bymathematicalsurfacesandbyinformation abouttheirownboundaries (consist ingofedgesandpossiblyvertices).Aclosedsurfacesuchasthesphereoraspheri calharmonicsurfaceofSection9.2.3maybethoughtofashavingonlyoneface. To specify a boundary representation, one must answer several important questionsofrepresentation design.Whatisaface,andhowarefacesrepresented? What isan edge, and how are edges represented? How much extra information (i.e.,useful butredundantrelationshipsandgeometricdata)shouldbekept? What isaface? "Face"isaninitiallyappealing butimprecisenotion;itisat itsclearest inthecontext ofplanarpolyhedra.Afaceshould probablyalwaysbea subset ofthe boundary ofanobject; presumably, itshould haveareabutnodan glingedgesorisolated points,and the union ofallthe faces should make up the boundary ortheobject.Beyondthislittlecanbesaid.Formanypurposesitmakes sensetohavefacesoverlap;itmaybeeleganttoconsidertheletteronanalphabet blockaspecialkindoffaceontheblockthatisasubset ofthefacemakingupthe side of the block. On the other hand, it iseasy to imagine applications in which facesshouldnotoverlapinarea (thenoneeasilycancomputethesurfaceareaofa solidfrom itsfaces). Insomeobjects,justwhat thefacesareispurely amatterof
Sec. 9.2 Surface Representations

265

1 wm 4,

^m^

Fig. 9.1 Avolumeandthefacesofa boundaryrepresentation.

opinion (Fig.9.2).Inshort,anysingledefinition offaceislikelytobeinadequate forsomeimportantapplication. Theavailabilityofexplicitrepresentationsofedges,faces,andverticesmakes boundaryrepresentations quiteuseful incomputervisionandgraphics.Thecom putationaladvantagesofpolyhedralsurfacesaresogreatthattheyareoften pressed intoserviceasapproximaterepresentationsofnonpolyhedra (Fig.9.3). An influential system for using facebased representations for planar po lyhedralobjectsisthe"wingededge"representation [Baumgart 1972].Includedin thesystemisaneditorforcreatingcomplexpolyhedralobjects (suchasthatofFig. 9.3) interactively.Thesystemusesrulesforconstructionbasedonthetheoremof Eulerthatif Fis thenumber ofverticesinapolyhedron, isthenumberofedges, and Fthe number offaces, then V E + F= 2.Infact, theformula canbeex tended to deal with nonsimply connected bodies. The extended relation is V E +F= 2(2? H)y with Bbeing the number of bodies and H being the

Fig. 9.2 Whatarethefaces? 266


Ch. 9 Representations of ThreeDimensional Structures

Fig. 9.3 A polyhedral approximation to a portion of a canine heart at systole and diastole.Both exterior (coarsegrid) and interior surfaces (fine grid) are shown.

numberofholes,or"handles,"eachresultingfrom aholethrough abody [Laka tos1976].Baumgart'ssystemusestheserulestooverseeandcheckcertainvalidity conditionsontheconstructionsmadebytheeditor. The"wingededge"polyhedron representationachievesmanydesiderata for boundary representations in an elegant way. This representation is presented below to give aflavorof the features that have been traditionally found useful. Given as primitives the vertices, edges, faces, and polyhedra themselves, and givenvariousrelationsbetweentheseprimitives,oneisnaturallythinksofarecord andpointer (relational) structureinwhichthepointerscapturethebinaryrelations and the records represent primitives and contain data about their locations or parameters. Inthewingededgerepresentation,therearedatastructurerecords,ornodes, which containfieldsholding data or links (pointers) toother nodes.An example usingthisstructure todescribeatetrahedron isshown inFig.9.4.Thereare four kinds ofnodes:vertices,edges, faces, and bodies.Toallowconvenient accessto thesenodes,theyarearranged inacirculardoublylinkedlist. Thebodynodesare actually the heads of circular structures for the faces, edges, and vertices of the body.Eachfacepointstooneofitsperimeteredges,andeachvertexpointstoone oftheedgesimpingingonit.Eachedgenodehaslinkstothefacesoneachsideof it,andtheverticesateitherend. Figure 9.4 shows only the lastmentioned links associated with each edge node. The reader may notice the similarity of this data structure with the data structure for region merging in Section 5.4. They are topologically equivalent. Eachedgealsohasassociatedfourlinkswhichgivethename"wingededge"tothe representation. These links specify neighboring edges in order around the two faces which are associated with the edge. The complete link set for an edge is shown in Fig. 9.5, together with the link information for bodies, vertices, and faces.Toallowunambiguoustraversalaround faces,andtopreservethenotionof
Sec. 9.2 Surface Representations 267

Fig. 9.4 Asubsetofedgelinksfora tetrahedron usingthe"winged edge" representation.

interiorandexteriorofapolyhedron,apreferential orderingofverticesandlinesis picked (counterclockwise,say,asseenfromoutsidethepolyhedron). Datafieldsin each vertex allowstorage of threedimensional world coordi nates, and also of threedimensional perspective coordinates for display. Each node hasfieldsspecifying itsnodetype,hidden lineelimination information, and other general information. Faces havefieldsfor surface normal vector informa tion,surfacereflectance, andcolorcharacteristics.Bodynodescarrylinkstorelate themtoatreestructureofbodiesinascene,allowingforhierarchicalarrangement ofsubbodiesintocomplex bodies.Thusbodynodedatadescribethescenestruc ture;facenodedatadescribesurfacecharacteristics;edgenodedatagivethetopo logical information needed to relate faces, edges, and vertices; and vertex node datadescribethethreedimensionalvertexlocation. Thisrichandredundantstructurelendsitselftoefficient calculationofuseful functions involving these bodies. For instance, one caneasily follow pointers to extractthelistofpointsaroundaface,facesaroundapoint,orlinesaroundaface. Wingededgesarenotauniversalboundaryrepresentation forpolyhedra, butthey dogiveanideaofthecomponentstoarepresentation thatarelikelytobe useful. Such arepresentation canbemadeefficient for accessing allfaces, edges,orver tices; for accessing vertex or edge perimeters; for polyhedron building; and for splitting edges and faces (useful in construction and hiddenline picture produc tion,forinstance).
9.2.2 SurfacesBasedonSplines

Thenaturalextension ofpolyhedralsurfacesistoallowthesurfaces tobecurved. However, withan arbitrary number ofedgesfor the surface, the interpolation of
268 Ch. 9 Representations of ThreeDimensional Structures

Boundary Representation NodeAccessing Functions Toenterandtraverse Faceringofabody: NextFace,PreviousFace: Bodyor Face Face Toenterandtraverse Edgeringofabody: NextEdge,PreviousEdge: Bodyor Edge* Edge ToenterandtraverseVertex ringof abody: NextVert,PreviousVert: Bodyor Vertex*Vertex First Edgeof aFace: FirstEdge: Face*Edge NCCW() PCW{) 5. FirstEdgeofaVertex: FirstEdge: Vertex>Edge FacesofanEdge: [seediagramin (a)] N(ext)Face,P(revious)Face: Edge>Face Verticesof anEdge: [seediagramin (a)] N(extVert, P(revious)Vert: Edge Vertex NeighboringWingEdgesofanEdge: [seediagramin (a)] NCW,NCCW: Edge<Edge (NFace EdgeClockwise, NFaceEdgeCounterclockwise) PCW,PCCW: Edge*Edge (PFaceEdgeClockwise, PFace Edge Counterclockwise (b)

NFacelf)

{e

Edge

PFace()

NVert(f)

NCW() (a)

PCCW()

Fig. 9.5 (a)Nodeaccessingfunctions, (b)Semanticsofwingededgefunctions.

interiorfacepointsbecomesimpracticallycomplex.Forthatreason,thenumberof edgesforacurvedfaceisusuallyrestrictedtothreeorfour. A general technique for approximating surfaces with foursided surface patchesisthat ofCoons [Coons 1974].Coonsspecifies thefour sidesofthepatch with polynomials. These polynomials are used to interpolate interior points. Although thisisappropriateforsynthesis,itisnotsoeasytouseforanalysis.This isbecauseofthedifficulty ofregisteringthepatchedgeswithimagedata.Agiven surfacewilladmittomanypatchdecompositions. An attractive representation for patches is splines (Fig. 9.6). In general, twodimensionalsplineinterpolationiscomplex:Fortwoparameterswand vinter polatewith x(w, v) = X VijBijiu,v)
'
J

(9.1;

similartoEq. (8.4).However, forcertainapplicationsafurther simplification can be made. In a manner analogous to (8.9) define a grid of knot points v y correspondingtoXyandrelatedby
Xij MVij

(9.2)

Now rather than interpolating in two dimensions simultaneously, interpolate in onedirection,say/,toobtain xiJ(t)=[P t2 t l][C][v ; _ U o ,v,, 0 ,v / + U o ( v / + 2 ,, 0 ] 7 " (9.3)

foreachvalueofj. Nowcomputev(f) bysolving


Xij(t) =
Sec. 9.2 Surface Representations

M\jj(t)

(9.4)
269

Fig. 9.6 Usingsplinecurvestomodel thesurfaceofanobject:aportionofa humanspinalcolumntakenfrom CAT data.

foreachvalueoft.Finally,interpolateintheotherdirectionandsolve: xiJ(s,t)=[s3 s2 s l][C][rll.jU),TuU),vl+ijU),v,+2jO)] (9.5) ThisisthebasisforthesplinefilteringalgorithmdiscussedinSection3.2.3. Someadvantagesofsplinesurfacesforvisionarethefollowing. 1. Thesplinerepresentation iseconomical:thespacecurvesarerepresentedasa sparsesetofknotpointsfromwhichtheunderlyingcurvescanbeinterpolated. 2. It is easy to define splines interactively bygiving the knot points; reference representationsmaybebuiltupeasily. 3. Itisoften useful tosearchtheimageinadirectionperpendicular tothemodel referencesurface.Thisdirectionisasimplefunction ofthelocalknotpoints. 9.2.3 SurfacesThatAreFunctionsontheSphere Somesurfaces can beexpressed asfunctions onthe"Gaussian sphere." (thedis tancefrom theorigintoapoint onthesurface isafunction ofthedirection ofthe point, orofitslongitude andlatitude ifitwereradially projected onaspherewith the center at the origin.) This class of surfaces, although restricted, isuseful in some application areas [Schudy and Ballard 1978, 1979]. This section explores briefly twoschemesfor representation ofthesesurfaces. Thefirstspecifies expli citly thedistanceofthesurface from theorigin for asetofvectordirections from theorigin.ThesecondisakintoFourierdescriptors;aneconomically specified set of coefficients characterizes the surface with greater accuracy as the number of coefficients increases. DirectionMagnitudeSets One approximation to aspherical function isto specify anumber of three dimensional direction vectors from the origin and for each amagnitude. Thisis equivalent tospecifying asetof (0,</>,p) pointsinaspherical coordinate system (Appendix 1).Thesepointsareonthesurfacetoberepresented;connectingthem yieldsanapproximation.
2 7 0 Ch. 9 Representations of ThreeDimensional Structures

Itisoftenconvenienttorepresentdirectionsaspointsontheunit (Gaussian) sphere centered on the origin. The points may be connected bystraight lines to form a polyhedron with triangular, hexagonal or rhomboidal faces. Moving the points on the sphere out (or in) by their associated magnitude distorts this po lyhedron,movingitsverticesradicallyoutorin. Thesphericalfunction determinesthedistanceoffaceverticesfrom theori gin. Resolution at the surface increases with the number of faces. An approxi mately isotropic distribution of directions over the surface may be obtained by placingthefacevertices(directions)inaccordancewith"geodesicdome"likecal culationswhichmakethefacesapproximatelyequilateraltriangles [Clinton1971]. Although the geodesic tesselation of the sphere's surface ismore complex thanastraightforward (latitudeandlongitude,say)division,itspleasantproperties ofisotropyanddisplay [Brown1979a;1979b;SchudyandBallard 1978]sometimes recommend it. Some example shapes indicating the range of representable sur facesaregiven inFig.9.7.MethodsfortesselatingthespherearegiveninAppen dix1. SphericalHarmonicSurfaces In twodimensions, Fourier coefficients can giveapproximations to certain curved boundaries (Section 8.3.4). Analogously in three dimensions, a set of orthogonal functions may be used to express a closed boundary as a set of coefficients whenthe boundary isafunction onthesphere.Onesuchdecomposi tion issphericalharmonics. Loworder coefficients capturegrossshapecharacteris tics; higher order coefficients represent surface shape variations ofhigher spatial frequency. Thefunction with m = 0isasphere, thethreewith m= 1represent translation about the origin, thefivewith m = 2aresimilar toprolateand oblate spheroids,andsoforth, thelobednessofthesurfacesincreasingwithm.Asample threedimensionalshapeandits"description"isshowninFig.9.8. Spherical harmonics are analogs on the sphere of Fourier functions on the plane;likeFourierfunctions, theyaresmoothandcontinuoustoeveryorder.They maybeparameterized bytwonumbers,mandn;thustheyareadoublyinfiniteset offunctions whicharecontinuous,orthogonal,singlevalued,andcompleteonthe

Fig. 9.7 Samplesurfaces described by some 320triangular facets inageodesic tesselation.


Sec. 9.2 Surface Representations

271

Fig. 9.8 A spherical harmonic function description of an ellipsoid. Coefficients aredisplayed on theright asgrey levelsin the matrix format "oo
"01 , , ^li w 0 2 "12 "21 v I 2 v 2 2

sphere. In combination, the harmonics can thus produce all "wellbehaved" spherical functions. The spherical harmonic functions Umn (9,<f>) and Vmn (9,<f>) aredefined in polarcoordinatesby: Um(9,<f>) = cos(nO)sin"(<f>)P(m, n,cos(0)) Ymn(0> 0) = sin(n9)sin"(<p)P(m, n,cos(</>)) (9.7)

(9.6)

within= 0,1,2, ...,M\ n 0,1,..., m.HereP(m, n,x) isthe th derivativeof the /wthLegendrepolynomialasafunction ofx. Torepresentanarbitrary shape, lettheradiusRinpolarcoordinatesbealinearsumofthesesphericalharmonics:
M m

R(0,4>)=j:

Z AmUmn(9, <f>) + BmnVmn(9, 0)


OT =0 = o

(9.8)

Any continuous surface on the sphere may berepresented byaset of these real constants; reasonable approximations to heart volumes areobtained with m< 5 [SchudyandBallard1979]. Figure 9.9 shows afew simple combinations of functions of low valuesof (m, n).Thesphere,or (0,0) surface,isaddedtothemorecomplexonestoensure positivevolumesanddrawablesurfaces. Sphericalharmonicshavethefollowingattractiveproperties. 1. Theyareorthogonalonthesphereundertheinnerproduct;
272 Ch. 9 Representations of ThreeDimensional Structures

Fig. 9.9 Simplecombinationsoffunctions. (, v) J uv s'm<f> dOd<f>

2. Thefunctions arearrangedinincreasingorderofspatialcomplexity. 3. Thewholesetiscomplete;anytwicedifferentiable function onthespherecan beapproximatedarbitrarilyclosely. Sphericalharmonicscanprovidecompact,nonredundantdescriptionsofsur faces that are useful for analysis ofshape, but are less useful for synthesis. The principaldisadvantagesarethattheprimitivefunctions arenotnecessarily related to the desired final shape in an intuitive way, and changing a single coefficient affectstheentireresultingsurface. Anexample oftheuseofsphericalharmonicsasavolumerepresentation is therepresentation ofheartvolume [SchudyandBallard 1978,1979].Inextracting avolumeassociatedwiththeheartfrom ultrasounddata,alargemassofdataisin volved. Thedataisoriginallyintheform ofechomeasurements taken inasetof twodimensional planes through the heart. The task is to choose asurface sur roundingtheheartvolumeofinterestbyoptimizationtechniquesthatwillfitthree dimensional timevarying data. The optimization involved is to find the best coefficients forthesphericalharmonicsthatdefinethesurface. Thegoodnessof fit ofasurfaceismeasuredbyhowwellitmatchestheedgeofthevolumeasitappears inthedataslices. Toextendsphericalharmonicstotimevaryingperiodicdata,let theradiusRinpolarcoordinatesbealinearsumofthesesphericalharmonics:
M m

R(0, 4>,t) =

H
m=0 w=0

AmnU)Umn{9, 0) + BmnU)Vmn{9, 0)

(9.9)

Sec. 9.2 Surface Representations

273

ThefunctionsAit)andBit)aregivenbyFouriertimeseries: / Am(t) =amm + 2 a>w,icosOtrt/r) + bmnism (2irt/r)


/=i

(9.10) (9.11)

Bm{i)=bmno+X c/wwcos(2TTr/r) + dwis\niltrt/r)


i=i

where t\stime,theamnh bmnh cmnh anddmniarearbitrary realconstants,andT the period. Any continuous periodically moving surface on the sphere maybe represented bysomeselection ofthese real constants; inthecardiac application, reasonableapproximationstothetemporalbehaviorareobtainedwitht ^ 3.Fig ure9.10showsthreestagesfrom amovingharmonicsurface representationofthe heartinearlysystole.Theatria,atthetop,contractandpumpbloodintotheven triclesbelow,afterwhichthereisaventricularcontraction.

9.3 GENERALIZED CYLINDER REPRESENTATIONS

The volumeofmanybiologicaland manufactured objectsisnaturallydescribedas the "swept volume" ofa twodimensional set moved along some threespace curve.Figure9.11showsa"translational sweep"whereinasolidisrepresentedas the volume swept byatwodimensional setwhen itistranslated alongaline.A "rotationalsweep"issimilarlydefined byrotatingthetwodimensionalsetaround anaxis.In"threedimensional sweeps,"volumesareswept.Ina"general"sweep scheme, thetwodimensional setor volume isswept along anarbitrary space curve,andthesetmayvaryparametricallyalongthecurve [Binford 1971;Soroka andBajcsy 1976;Soroka1979a; 1979b;Shani1980].Generalsweepsarequiteapo pular representation incomputer vision, where they gobythename generalized cylinders(sometimes"generalizedcones").

Fig. 9.10 Threestagesfromamovinghar monicsurface (seetextandcolorinsert).

274

Ch.9 Representations ofThreeDimensional Structures

Sweep

A,

Fig. 9.11 Atranslationalsweep.

Ageneralizedcylinder (GC)isasolidwhoseaxisisa3Dspacecurve (Fig. 9.12a).Atanypointontheaxisaclosedcrosssectionisdefined.Ausualrestriction isthattheaxisbenormaltothecrosssection.Usuallyitiseasiesttothinkofanaxis space curve and a cross section point set function, both parameterized by arc length along theaxiscurve. Foranysolid, thereareinfinitely manypairsofaxis andcrosssectionfunctions thatcandefineit. Generalized cylinders present certain technical subtletiesintheir definition. Forinstance,canitbedeterminedwhetheranytwocrosssectionsintersect,asthey wouldiftheaxisofacircularcylinderweresharplybent (Fig.9.12b)?Ifthesolidis definedasthevolumesweptbythecrosssection,thereisnoconceptualorcompu tational problem. Aproblemmightoccurwhencomputingthesurface ofsuchan object. Ifthesurface isexpressed intermsoftheaxisand crosssection functions (asbelow), the domain ofobjects must be limited so that the boundary formula indeedgivesonlypointsontheboundary. Generalized cylinders areintuitiveand appealing.Let usgrant that "patho logical" cases are barred, so that relatively simple mathematics is adequate for representing them.Therearestilltechnicaldecisionstomakeabouttherepresen tation.Theaxiscurve presents nodifficulties, butausablerepresentation for the crosssection set isoften notsostraightforward. Themain problem istochoosea usablecoordinatesysteminwhichtoexpressthecrosssection. 9.3.1 Generalized CylinderCoordinateSystemsandProperties Twomathematicalfunctions definingaxisandcrosssectionforeachpointdefinea uniquesolidwiththe"sweeping"semanticsdescribedabove.InafixedCartesian coordinatesystemx,y,z,theaxismayberepresented parametricallyasafunction ofarclengths: a(s) (x(s),y(s), z(s)) (9.12)

Itisconvenienttohavealocalcoordinatesystemdefined withoriginateach pointofa(s).Itisinthiscoordinatesystemthatthecrosssectionisdefined. This system may change in orientation as the axis winds through space, or itmay be mostnaturalforitnottobetiedtothelocalbehavioroftheaxis. Forinstance,im aginetyingaknot inasolidrubber bar ofsquarecrosssection.Thecrosssection
9.3 Generalized Cylinder Representations 275

(a)

(b)

Fig. 9.12 (a) Ageneralized cylinder and some crosssectional coordinate sys tems. (b) A possibly "pathological" situation. Cross sections may be simply describedascirclescenteredontheaxis,butthentheirintersection makesvolume calculations (forinstance) lessstraightforward.

willstayapproximately asquare,and (thisisthepoint) willremain approximately fixed inacoordinatesystemthattwistsandturnsthroughspacewiththeaxisofthe bar. On the other hand, imagine bolt threads. They can bedescribed byasingle crosssection thatstaysfixedinacoordinatesystem thatrotatesasitmovesalong thestraightaxisofthebolt.Thereisnoapriorireasontosupposethatsuchauseful localcoordinatesystemshouldtwistalongtheGCaxis. A coordinate system that mirrors the local behavior of the GC axis space curveisthe"Frenetframe," definedateachpointontheGCaxis.Thisframepro videsmuchinformation abouttheGCaxisbehavior.TheGCaxispointforms the origin, and the three orthogonal directions are given by the vectors (, v, ), where f = unit vector tangent axis v =unit vector direction ofcenter ofcurvature ofaxis normal curve = unit vector direction ofcenter oftorsion ofaxis Consider the curve to be produced byapoint moving atconstant speed through space; thedistance the point travelsisthe parameter ofthespacecurve [O'Neill 1966].Since isofconstant length, itsderivative measures thewaytheGCaxis turnsinspace.Itsderivative'isorthogonaltoandthelengthof'measuresthe curvature Kof the axis at that point. The unit vector in the direction of'is v. Where the curvature is not zero, abinormal vector orthogonal to and v is defined.Thisbinormalisusedtodefinethetorsionr ofthecurve. Thevectors, *>,obeyFrenet's formulae:

v' = K + ri C = TV
276

(9.13)

Ch.9 Representations ol ThreeDimensional Structures

where K= curvature= v' = v f' T= torsion i>' v ' (9.14) (9.15)

TheFrenetframegivesgoodinformation abouttheaxisoftheGC,butithas certainproblems.First,itisnotwelldefined whenthecurvatureoftheGCaxisis zero.Second,itmaynotreflectknownunderlyingphysicalprinciplesthatgenerate the cross sections (asin the bolt thread example). Asolution, adopted in [Agin 1972, Shani 1980],isto introduce an additional parameter that allows the cross sectiontorotateabout the localaxisbyanarbitrary amount.Withthisadditional degreeoffreedom comesanadditionalproblem:Howaresuccessivecrosssections registered? Figure9.13showstwosolutionsinadditiontotheFrenet frame solu tion. Thecrosssectionalcurveisusuallydefinedtobeinthev plane,normalto , thelocalGCaxisdirection. Thecrosssectionmaybedescribedasapointsetin thisplane, using inequalities expressed in the vZ,coordinate system. The cross section boundary (outlinecurve) maybeused instead, parameterized byanother parameterr.Letthiscurvebegivenby crosssection boundary = (x(r, s), y(r, s)) Thedependenceon5reflects thefact that thecrosssectionshapemayvaryalong theGCaxis.Theexpressionaboveisinworldcoordinates,butshouldbemovedto

(a)

(b)

(0

Fig. 9.13 (a) Localcoordinates are theFrenet frame. Points Aand Bmust correspond. (b)Localcoordinatesaredetermined bythecrosssectionalshape,(c)Localcoordinatesare determined byaheuristictransformation from worldcoordinates.
Sec. 9.3 Generalized CylinderRepresentations 277

the local coordinates on the GC axis.Atransformation ofcoordinatesallowsthe GCboundarytobeexpressed (iftheGCiswellbehaved)as B(r, s) =a(s) +x(r, s)v(s) +y(r, s)(s) (9.16) Oneoftheadvantagesofthegeneralizedcylinderrepresentation isthatital lowsmanyparametersofthesolidtobeeasilycalculated. InmatchingtheGCtoimagedataitisoften necessarytosearch perpendicular toacrosssection.Thisdirection isgiven from x(r, s), y{r, s) by {{dy/ds)v, {dx/ds)0 TheareaofacrosssectionmaybecalculatedfromEq.(8.16). ThevolumeofaGCisgivenbytheintegralof:theareaasafunction oftheaxis parametermultipledbytheincrementalpathlengthoftheGCaxis,i.e., volume= J area(s) ds
9.3.2 ExtractingGeneralized Cylinders

Earlyworkinbiologicalform analysisprovidesanexampleoftheprocessoffitting aGCtorealdataandproducingadescription [Agin1972].Oneofthegoalsofthis workwastoinfer thestickfigureskeleton ofbiologicalforms for usein matching modelsalsorepresentedasskeletons.InFig.9.14theprocessofinferring theaxis from the original stripe threedimensional data isshown; the processiteratesto wardasatisfactoryfit,usingonlycircularcrosssections (acommonconstraintwith "generalized"cylinders).Figure9.15showsthedataandtheanalysisofacomplex

Fig. 9.14 Stages inextractinga generalized cylinder description fora circular cone, (a) Front view, (b) Initial axisestimate, (c) Preliminary center and axisestimate, (d) Cone with smoothed radiusfunction, (e) Completed analysis. 278 Ch. 9 Representations of ThreeDimensional Structures

(a) (b) Fig. 9.15 (a)TVimageofadoll, (b)Completed analysisofdoll.

biological form. Inreal data, complexly interrelated GCsarehard todecompose intosatisfactorysubparts.Withoutthat,theabilitytoformasatisfactory articulated skeletonisseverelyrestricted. In later work, GCs with splinebased axesandcross sections were usedto modelorgansofthehumanabdomen [Shani1980]. Figure9.16showsarendition f t ofaGC itoahumankidney.
9.3.3 ADiscreteVolumetric Version of the Skeleton

Anapproximatevolumerepresentation thatcanbequiteuseful isbasedonanarti culated wire frame skeleton along which spheres (notcrosssections) areplaced.

Fig. 9.16 Generalized cylinder representation oftwo kidneysanda spinalcolumn.Thiscoarse, nominal model isrefined duringexaminationof CATdata (seeFig.9.6).
Sec. 9.3 Generalized Cylinder Representations

279

This representation hassome of the flavor ofan approximate sweep representa tion.Anexampleoftheuseofsucharepresentation andafigurearegiveninSec tion 7.3.4. This representation wasoriginally conceived for graphics applications (the sphereslookthe samefrom anyviewpoint) [BadlerandBajcsy 1978].Colli sion detection is easy, and threedimensional objects can be decomposed into spheres automatically [O'Rourke andBadler 1979].From thespheres, theskele ton may be derived, and so may the surface of the solid. This representation is especially apt for many computer vision applications involving nonrigid bodiesif strict surface and volumetric accuracy is not necessary [Badler and O'Rourke 1979].
9.4 VOLUMETRIC REPRESENTATIONS

Most world objects aresolids, although usually only their surfaces are visible.A representation ofthe objectsintermsofmoreprimitivesolidsisoften useful and canhavepleasantpropertiesofterseness, validity, andsometimeseaseofcompu tation.Therepresentationsgivenherearepresentedinorderofincreasinggeneral ity; constructive solid geometry includes cell decomposition, which in turn in cludesspatialoccupancyarrays. Algorithms for processing volumebased representations are often of a different flavor thansurfacebased algorithms.WegivesomeexamplesinSection 9.4.4. Objects represented volumetrically can be depicted on raster graphics de vicesbya"raycasting" approach inwhichalineofsight isconstructed through theviewingplaneforasetofrasterpoints.Thesurfaceofthesolidatitsintersec tion with the lineofsight determines the value ofthe display attheraster point. Raycastingcanproducehiddenlineandshadeddisplays;graphicsisonlyoneofits applications (Section9.4.4).
9.4.1 Spatial Occupancy

Figure 9.17 shows that threedimensional spatial occupancy representations are the threedimensional equivalent of the twodimensional spatial occupancy representations ofChapter 8. Volumesarerepresentedasathreedimensionalar rayofcellswhichmaybemarkedasfilledwithmatterornot.Spatialoccupancyar rayscan require much storage ifresolution ishigh, sincespacerequirements in creaseasthe cube oflinear resolution. Inlowresolution workwith irregular ob jects, such as arise in computeraided tomography, spatial occupancy arrays are verycommon.Itissometimesusefultoconvertanexactrepresentationintoanap proximatespatialoccupancyrepresentation.Slicesorsectionsthroughobjectsmay be easily produced. The spatial occupancy array may be runlength encoded (in one dimension), or coded asblocksofdifferent sizes;such schemes areactually celldecompositionschemes (Section9.4.2). With the declining costofcomputer memory, explicit spatialoccupancy ar rays may become increasingly common. The improvement of hardware facilities for parallelcomputation willencouragethe development ofparallelalgorithmsto computepropertiesofsolidsfromtheserepresentations.
280 Ch. 9 Representations of ThreeDimensional Structures

Fig. 9.17 Asolid (theshapeofa humanredbloodcell)approximatedby avolumeoccupancyarray. 9.4.2 Cell Decomposition

In cell decomposition, cellsaremore complex inshape but still "quasidisjoint" (donot share volumes), so the only combining operation is "glue" (Fig. 9.18). Cellsareusually restricted tohavenoholes (theyare "simply connected").Cell decompositions are not particularly concise; their construction (especially for curvedcells)isbestleft toprograms.Itseemsdifficult toconvertotherrepresenta tions exactly into cell decompositions. Two useful cell decompositions are the "octtree" [Jackins and Tanimoto 1980] and the kdtree [Bentley 1975].They bothcanbeproduced byrecursive subdivision ofvolume;these schemesarethe threedimensional analogsofpyramid data structures for twodimensional binary images. The quasidisjointness of celldecomposition and spatialoccupancy primi tivesmaybehelpful insomealgorithms. Massproperties (Section 9.4.4) maybe computed onthecomponentsandsummed. Itispossibletotellwhetherasolidis connectedandwhetherithasvoids. Inhomogeneousobjects (suchashumanana tomyinsidethethorax) canberepresentedeasilywithcelldecompositionandspa

Solid

Fig. 9.18 Avolumeanditscelldecomposition.


Sec. 9.4 Volumetric Representations

281

tialoccupancy.TheCTnumber (transparencytoxrays)oramaterialcodecanbe keptinacellinsteadofasinglebitindicationof"solidorspace."


9.4.3 ConstructiveSolid Geometry

Figure9.19showsoneconstructive solidgeometry (CSG) scheme [Voelckerand Requicha 1977;Boyse1979].Solidsarerepresentedascompositions,viasetopera tions,ofothersolidswhichmayhaveundergonerigidmotions.Atthelowestlevel areprimitivesolids,whichareboundedintersectionsofclosedhalfspaces defined bysomeFix,y,z) ^ 0,where .Fiswellbehaved (e.g.,analytic).Usually,primi tives areentities such asarbitrarily scaled rectangular blocks, arbitrarily scaled cylindersandcones,andspheresofarbitraryradius.Theymaybepositionedarbi trarilyinspace. Figure9.20showsaparameterizedrepresentation [MarrandNishihara1978; Nishihara 1979]basedonshapes (herecylinders) thatmightbeextractedfroman image. A CSGrepresentation isan expression involving primitive solid andset operatorsforcombinationandmotion. <CSG7?e/?> ::=<primitive solid> | MOVE <CSG Rep> BY<Motion Params> | <CSG Rep> <CombineOp> <CSG Rep> Thecombining operatorsarebesttakentoberegularizedversionsofsetun ion,intersection, anddifference (thecomplement isapossibleoperator, butital lowsunboundedsolidsfromboundedprimitives). Regularityisafundamental property ofanysetofpointsthatmodelsasolid. Inagivenspace,asetXisregularifX kiX,wherekand/'denotetheclosureand interioroperators. Intuitively, aregular sethasnoisolated ordangling boundary points.Theregularization rofasetXisdefined byrX= kiX. Regularization infor mally amounts totaking what isinsideasetandcovering that withatight skin. Regular sets arenot closed under conventional setoperations, but regularized

Fig. 9.19 Constructivesolid geometry for thevolumeofFig.9.18. 282 Ch.9 RepresentationsofThreeDimensionalStructures

cylinder

limb

quadruped

biped

1
thicklimb _ cow

dove

Fig. 9.20 A parameterized constructive representation for animal shapes.

operatorsdopreserveregularity.Regularizedoperatorsaredefinedby X <OP> * Y= r(X <OP> Y) Regularity and regularized set operators provide a natural formalization of the dimensionpreserving propertyexhibitedbymanygeometricalgorithms,thusob viating the need to enumerate many annoying "special cases."Figure 9.21illus trates conventional versus regularized intersection oftwosets that areregular in theplane. Iftheprimitivesareunbounded, checkingfor boundedness ofanobject can be difficult. If they are bounded, any CSG representation is a valid volume representation. CSGcanbeinefficient for somegeometricapplications,suchasa linedrawingdisplay. (ConvertingtheCSGrepresentation toaboundaryrepresen tationistheonewaytoproceed;seeSection9.4.4.)

A r\B

An*B

Fig. 9.21 Conventional (f] )and regularized ( P | *)polygon intersection. Sec. 9.4 Volumetric Representations 283

9.4.4 AlgorithmsforSolid Representations

SetMembershipClassification Thesetmembershipclassification (SMC) function Mtakesacandidatepoint set Candareference setS,andreturnsthepointsofCthatareinS,outof5,and ontheboundaryofS. (CmS, CoutS, ConS) :=M(C,S) Figure9.22ashowslinepolygon classification. SMC isa generalization of set intersection [Tilove 1980]. It isa useful geometric utility; polygonpolygon classification is generalized clipping, and volumevolume classification detects solid interference. Linesolid classification

(a)

(b)

Fig. 9.22 (a)Thesetmembership classification (SMC) function M(L,P) finds the portionsofthe candidatesetL(herealine) thatarein,on,andoutofa refer ence set (here a polygon) P. (b)Image produced byraycasting, aspecial caseof SMC.
284 Ch. 9 Representations of ThreeDimensional Structures

maybeusedforraycastingvisualizationtechniquestogenerateimagesofaknown threedimensionalrepresentation (Fig.9.22b). Analgorithm forSMCillustratesa"divideandconquer" approachtocom puting onCSG.RecallthatCSG islikeatreeofsetoperations, whoseleavesare primitive sets which usually are simple solids such as cylinders, spheres, and blocks.Presumably classification canbemoreeasily computed with thesesimple setsasreference thanwithcomplexunions,intersections,anddifferences asrefer ence. Theideaisthattheclassification ofaset CwithrespecttoacomplexobjectS defined inCSGmaybedetermined recursively. Anyinternal nodeSintheCSG treeisanoperationnode.Ithasleft andrightargumentsandanoperationOpofS. EachsubtreeisitselfaCSGsubtreeoraprimitive. MiX, S) IFSisaprimitiveTHEN primMiX, S) ELSECombine(MiX, leftsubtree (S), MiX, rightsubtreeiS), OPofS); PrimMisthe easilycomputed classification with respect toasimpleprimi tive solid. The Combine operation is anontrivial calculation that combines the subresultstoproduceamorecomplexclassification. Itisillustratedintwodimen sionsforlineclassification inFig.9.23.Havingclassified thelineLagainstthepo lygonPI andP2, theclassifications canbecombinedtoproducethe classification forPI f] P2. Preciserulesforcombinemaybewrittenfor (regularized) union, intersection, and setdifference. An important pointisthatwhenapointisinthe "on" setofS\andinthe"on" setofS2, theresultofthecombinationdependson extra information. In Fig. 9.23,segments Xand Fboth result from this ONON caseofcombine, butsegmentXisOUToftheboundaryoftheintersectionand Y isINtheintersection.Theambiguitymustberesolvedbykeeping "neighborhood information" (localgeometry)attachedtopointsets,andcombiningtheneighbor hoodsalongwiththeclassifications.Thetechnicalproblemssurrounding combine can be solved, and SMC is basic in several solid geometric modeling systems [Boyse1979;Voelckeretal.1978;Brownetal.1978]. MassProperties The analogofmany twodimensional geometric properties istobefound in "mass properties," which aredefined byvolume integrals overasolid.The four typesofmasspropertiescommonlyofinterestare: Volume: V=J du
s

j x du Centroid:e.g.GCX=

Sec. 9.4 Volumetric Representations

285

P\

Out

On

Out

On

Out

In

Out

(a)

P2

Out

In

On
1

In

On

Out

In

Out

'

'

*'

'

'

(b)

Fig. 9.23 Combining linepolygon classifications (a)and (b) must produce the classification (c).

Momentof Inertia:e.g.1^ = m J (y2 +z2) du Productof Inertia:e.g.Pxy =m J xy du

(9.17)

2 8 6

Ch. 9 Representations of ThreeDimensional

Structures

where misadensity measure, duthe volume differential, and integralsaretaken overthevolume. Measures such as these are not necessarily easy to compute from agiven representation.Thecalculation ofmasspropertiesofsolidsfrom variousrepresen tationsisdiscussed in [LeeandRequicha 1980].Theapproachessuggested bythe representationsareshowninFig.9.24. One method isbasedondecomposing the solid into quasidisjoint cells.An integralpropertyofthecelldecompositionisjustthesumofthepropertyforeach ofthecells.Henceifcomputingthepropertyforthecellsiseasy,thecalculationis easyforthewholevolume.Oneisinvitedtodecomposethebodyintosimplecells, suchascolumns orcubes,asshown in Fig.9.25.Theresultingcalculations, per formed toreasonableerror bounds onfairly complexvolumes,take unacceptably longforthepurespatialoccupancyenumeration,butareacceptableforthecolumn and blockdecompositions. (Thecolumn decomposition correspondstoaraycast ing approach.) The block decomposition method can be programmed using oct trees orkdtrees inamanner reminiscent ofthe Warnock hiddenline algorithm [Warnock 1969],inwhichtheblocksarefoundautomatically,andtheirsizedimin ishesasincreased resolutionisneeded inthesolid.Incalculatingfromaconstruc tive solid geometry representation, the samedivideandconquer strategy that is useful for SMCmaybeapplied.Again,itrecursivelysolvessubproblems induced by the set operators (Fig. 9.26). The strategy is less appealing here since the numberofsubproblemscangrowexponentiallyintheworstcase. In boundary representations, one can perhaps directly integrate over the boundary inathreedimensional version ofthe polygon area calculation givenin Chapter 8.Thismethod isoften impossible for curved surfaces, which, however, may be approximated by planar faces. An alternative is to use the divergence

Fig. 9.24 "Natural" approaches to computing mass properties from several representations. Sec.9.4 Volumetric Representations 287

f\* A

y^i fk
M

y
(b)

/ / / / /

CSG rep

4\
(c)

T
%\

F i g . 9.25 Celldecompositions for massproperties.

theorem (Gauss'stheorem).Thedivergenceisascalarquantitydefinedatanypoint inavectorfieldbywritingthevectorfunctionas G(x, y, z) = P(x, y, z)\ +Q(x, y, z)j + R(x,y, z)k. Thedivergenceis
h H (9.19) x y z Thereisalwaysafunction GsuchthatdivG=fix, y, z) foranycontinuousfunc tion/(/computes theintegralpropertyofinterest.)Thus
div G =

(9.18)

/ / dv=J divGdv
s s

(9.20)

Butthedivergencetheoremstatesthat J" divGdv=I J Gn; dF, (9.21)

where Ftisaface ofthe solid S, n, isthe unit normal to Fh and dF{ the surface differential. Again this formula workswell for planar faces, but may requireap proximationtechniquesforcurvedfaceswithcomplexboundaries. BoundaryEvaluation The calculation of a facebased surface (boundary) representation from a
288 Ch. 9 Representations of ThreeDimensional Structures

Divideandconquer Reduction formula

A VB

Ans

AB

"A

A ns

Example

* S

*A + ' B

' ACB

'Ar\C

^BnC+

^A

Fig. 9.26 Recursive problem decomposition for mass property calculation.

CSGrepresentation iscalled boundaryevaluation. Itisanexampleofrepresentation conversion. BoththeCSGandboundary areusuallyunambiguous representations ofavolume; aCSG expression (asolid) hasjust one boundary, but aboundary (representingasolid)usuallyhasmanyCSGexpressions. Sinceasolidmaybeput together from primitives inmanyways,themappingbackfrom boundary toCSG isnotusuallyattempted (butsee [MarkovskyandWesley 1980,WesleyandMar kovskyl981]). One style of boundary evaluation is based on the following observations [VoelckerandRequicha1980;Boyse1979]. Boundaries of composite objects may becomputed from certain settheoretic formulae.For(regularized) intersectionoftwoobjectsSand T,theformulais

b(S C\' T)= (bS O' iT) U *QSH *bT)

u ' ^ n ' ^ n *ki(sn*T))

(9.22)

where p) * and {J * areregularized intersection and union:b,/,and karethe boundary,interior,andclosureoperators. (Recallthatkiisr,theregularization operator). Facesofcompositeobjectscanariseonlyfromfacesofprimitives. Facesareeitherboundedbyedgesorareselfclosing (asisthesphere). Theseobservationsandtheexistenceoftheclassification operation motivate thegrandstrategythatfollows (ignoringseveralimportant detailsand concentrat ingonthecoreofthealgorithm.)
Sec. 9.4 Volumetric Representations

289

1. Find all possible ("tentative") edges for each face of each primitive in the composite. 2. Classifyeachtentativeedgewithrespecttothecompositesolid. 3. TheONportionsofthoseedgesmustbeenoughtodefinetheboundary. Given the grand strategy, several algorithms of varying sophistication are possible,dependingonwhatedgesshould beclassified (howtogeneratetentative edges),inwhatordertheyshouldbeclassified, andhowclassification isdone.The following algorithm is very simple (but very inefficient); useful algorithms are rathermorecomplex.

Algorithm9.1: CSGtoBoundaryConversion (toplevelcontrolloop) Input: Solid defined by CSG expression of regularized set operations applied to primitivesolids. Output:"Bfaces"intheobjectboundary.Bfacesarerepresentedbytheirbounding edges.Theymayhavelittlerelationtothe "intuitivefaces" oftheboundary;they mayoverlapeachother, andaBfacemaybedisconnected (specify morethanone region).Edgesmayappearmanytimes.TheBfaceoriented boundarymaybepro cessed to remove repetition and merge Bfaces into more intuitively appealing boundaryfaces. BEGIN Form alist PFaces of all ("intuitive") faces of primitive solids involved in the CSGexpression,andaninitiallyemptylistBFacestoholdtheoutputfaces. ForeveryPFaceF\ inPFaces: CreateaBFacecalledThisBFace,initiallywithnoedgesinit. ForeveryPFaceF2after Fl inthePFaceslist(thisgeneratesalldistinctpairsof PFacesjustonce): IntersectFl andF2togetTEdges,asetofedgestentativelyontheboundary ofthesolid.IfFl andF2donotintersectorintersectonlyinapoint,TEdges isempty.Iftheyintersectinaline,TEdgesisthesingleresultingedge.Ifthey intersect inatwodimensional region, TEdgescontains thebounding edges oftheintersectionregion. Classify everyTEdgeinTEdgeswithrespecttothewholesolid (theCSGex pression).PutTEdgesthatareONthesolidboundaryintoThisBFace. IfThisBFaceisnotempty,putitintoBFaces.
290
Ch. 9 Representations of ThreeDimensional Structures

EndInnerLoop EndOuterLoop END Algorithms such as this involve many technical issues, such as merging coplanar faces,stitchingedgestogether intofaces,regularization offaces, remov ingmultipleversionsofedges. Boundaryevaluation isinherentlyrathercomplex, anddependsonsuchthingsasthedefinition andrepresentation offacesaswellas the geometric utilities taken asbasic [Voelcker andRequicha 1981]. Boundary evaluation is an example of exact conversion between significantly different representations.Suchconversionsareuseful,sincenosinglerepresentationseems convenientforallgeometriccalculations.

9.5 UNDERSTANDING LINE DRAWINGS

"Engineering" linedrawingshavebeen (andtoagreatextent arestill) themain medium ofcommunication betweenhuman beings about quantitative aspectsof threedimensional objects.The linedrawingsofthissectionareonlythosewhich aremeanttorepresentasimpledomainofpolyhedralorsimplycurvedobjects.In terpretationof"naturalistic"drawings(suchasasketchmap[Mackworth 1977])is anothermatteraltogether. Linedrawings (eveninarestricted domain) areoften ambiguous; interpret ingthem sometimes takesknowledgeofeveryday physics,andcanrequire train ing. Suchinformed interpretation meansthatevendrawingsthatarestrictlynon sense canbeunderstood andinterpreted asthey were meant. Missing linesin drawingsofpolyhedraareoften soeasytosupplyastopassunnoticed,orbe"au tomaticallysupplied"byourmodeldriven perception. Generalizingthelinedrawingtothreedimensionsasalistoflinesorpointsis not enough tomake anunambiguous representation, asisshown byFig.9.27,

Fig. 9.27 An ambiguous (wireframe) representations of a solid with twoof three possible interpretations.

Sec.9.5 Understanding Line Drawings

291

whichillustratesthatasetofverticesoredgescandefinemanydifferent solids.(It is possible, however, to determine algorithmically all possible polyhedral boun daries described by a threedimensional wireframe [Markowsky and Wesley 1980].).Alinedrawingnevertheless doesconveythreedimensional information. ForanysetofNprojection specifications (e.g.,viewpointandcameratransform),a wireframe object may beconstructed that isambiguous given the N projections. However, for agiven object, thereisamaximum number ofprojections thatcan determine the object unambiguously. The number depends on the number of edgesintheobject [Shapira1974].Reconstruction ofallsolidsrepresentedbypro jectionsispossible [WesleyandMarkowsky1981]. Linedrawingswereanaturalearlytargetforcomputer visionfor the follow ingreasons: 1. Theyarerelatedcloselytosurfacefeaturesofpolyhedralscenes. 2. They mayberepresented exactly;thenoiseand incomplete visualprocessing thatmayhaveaffected the"linedrawingextraction"canbemodelledatwillor completelyeliminated. 3. Theypresentaninterpretation problemthatissignificant butseemsapproach able. The understanding of simple engineering (3view) drawings was the first stage in a versatile robot assembly system [Ejiri et al. 1971]. This application underlined thefact that heuristics andconventionsareindispensibleinengineer ing drawingunderstanding. Thissection dealswith the problem of "understand ing"asingleviewlinedrawingrepresentationofscenescontainingpolyhedraland simplecurvedobjectslikethoseinFig.9.28. Our exposition follows ahistorical path, to show how early heuristic pro grams in the middle 1960s evolved into more theoretical insights in the early 1970s. The first real computer vision program with representations of a three dimensional domain appeared around 1963 [Roberts 1965].This system, ambi tious even by today's standards, was to accept adigitized image ofa polyhedral sceneandproducealinedrawingofthesceneasitwouldappearwhenviewed from anyrequested viewpoint. This work addressed basicissues ofimaging geometry, featurefinding,objectrepresentation,matching,andcomputergraphics. Sincethen,severalsystemshaveappearedforaccomplishingeitherthesame orsimilarresults [Falk 1972;Shirai 1975;Turner 1974].Thelinedrawingsofthis section canappearasintermediate representations inaworking polyhedral vision system,buttheyhavealsobeenstudied inisolation.Thistopictookonalifeofits ownandprovidesaveryprettyexampleofthegeneral ideaofgoingtothethree dimensional world of physics and geometry to understand the appearance of a twodimensional image.The later resultscanbeused to understand moreclearly the successes and failures ofearly polyhedral vision systems.Oneform ofunder standing (linelabelling) provided oneofthefirstandmostconvincing demonstra tionsofparallelconstraintpropagationasacontrolstructureforacomputervision process.

292

Ch. 9 Representations of ThreeDimensional

Structures

Fig. 9.28 Severaltypicallinedrawingscenesforcomputer understanding. 9.5.1 MatchingLineDrawingstoThreedimensional Primitives

RobertsdesirestointerpretalinedrawingsuchasFig.9.28aintermsofasmallset ofthreepolyhedralprimitives,showninFig.9.29.Asimplepolyhedroninascene isregarded asan instance ofatransformed primitive, whereatransform mayin volvescalingalongthethreecoordinateaxes,translation,androtation.Compound polyhedra,suchasFig.9.28a, areregardedassimplepolyhedra "glued together." (Acelldecomposition representation isthususedfor compound polyhedra.) The program isfirst toderivefrom thescenetheidentityoftheprimitiveobjects used to construct it (including details of the construction of compound polyhedra). Next, itisto discover the transformations applied tothe primitives toobtain the particular incarnations making up the scene. Finally, to demonstrate its under standing, it should be able to construct a line drawing of the scene from any viewpoint,usingitsderiveddescription. Tounderstandapartofthescene,theprogramfirstdecideswhichprimitiveit comesfrom, and then derives the transformation the primitive underwent toap pearasitdoesinthescene.Identifying primitivesisdonebymatching "topologi cal"features ofthelinedrawing (configurations offaces,lines,andvertices) with those of the model primitives; matching features induce amatch between scene andmodelpoints. Atleastfour noncoplanarmatchingpointsareneededtoderive
Sec.9.5 Understanding LineDrawings 293

( a )

(b)

(c)

Fig. 9.31 Topological matchstructures ofRoberts.

The ideaonce again isto accumulate localevidence from thescene,and then to group polygons on the basis of this evidence. The evidence takes the form of "links" which link two regions if they may belong to the same body; links are planted around vertices, which are classified into types, each type always planting thesamelinks (Fig.9.32).Nolinksaremadewiththebackground region. Scenes are interpreted by grouping according to regions/links, using fairly complex rules, including "inhibitory links" that preclude two neighboring regions from beinginthesame body. Thefinal form oftheprogram performs reasonably wellonsceneswithoutac cidents of visual alignment, but it is a maze of special cases and exceptions, and seemstoshed littlelightonwhatisgoingoninknown polyhedrallinedrawing per ception. One might well ask where the links come from; nojustification of why theyarecorrect isgiven. Further ([Mackworth 1973]),Guzman can accept asone body the two regions in Fig. 9.33a. Finally, one feels a little dissatisfied with a scheme thatjust answers "one body" toascene likeFig.9.33b, instead of answer ing"pyramid oncube"or "two wedges,"for example. Guzman's method iscorrect for aworld ofconvex isolated trihedral polyhe dra: it is extended by ad hoc adjustments based on various potentially conflicting items ofevidence from the linedrawing. Ultimately itperforms adequately witha much increased range of scenes, albeit not very elegantly. Further progress in the line drawing domain came about when attention was directed at the three dimensional causesofthedifferent vertex types.

\
FORK ARROW ELL

PSI 9.5 Understanding LineDrawings

PEAK

Fig. 9.32 Linksaround vertices. 295

Fig. 9,33 (a) Nonpolyhedral scene, (b)Twowedgesorapyramid on cube.

9.5.3 LabelingLines Huffman and Clowesindependently concerned themselves withscenessimilarto Guzman's, not excluding nonsimply connected polyhedra, but excluding ac cidents of alignment [Huffman 1971;Clowes 1971].They desired to say more about thescenethanjust which regionsarosefrom singlebodies;they wanted to ascribeinterpretations tothe lines.Figure9.34showsacuberestingon the floor; lines labeled with a + are caused byaconvex edge, those labeled with a are causedbyaconcaveedge,andthoselabeledwitha> arecausedbymatterocclud ingasurface behind it.The occluding matter istothe rightofthe linelookingin thedirectionofthe > , theoccludedsurfaceistotheleft.Ifthecubewerefloating, onewould labelthelowestlineswith < instead ofwith.Theshadowlinelabels (arrows)werenotusedby Huffman. Asystematicinvestigation canfind thetypesoflinespossiblyseenarounda trihedral corner; such corners canbeclassified byhowmany octants ofspaceare filledbymatter around them (one for the corner ofacube, seven for the inside corner ofaroom,etc.).Byconsideringallpossibletrihedralcornersasseen from

Fig. 9.34 Ablock resting on its bottom surface. 296


Ch. 9 Representations of ThreeDimensional Structures

all possible viewpoints, Huffman and Clowes found that without occlusion, just four vertextypesandonlyafewofthepossiblelabelingsoflinesmeetingataver texcanoccur.Figure9.35showsviewsofoneandthreeoctantcornerswhichgive risetoallpossibleverticesfor thesecorner types. Theverticesappear inthefirst tworowsofTable9.1,whichisacatalogueofallpossiblevertices,includingthose arisingfrom occlusion,inthisrestricted worldoftrihedral polyhedra. Itiseasyto imagineextendingthecatalogtoincludeverticesforothercornertypes. Itisimportanttonotethattherearefour possiblelabelsforeachline(1> < ) , andthus43= 64possiblelabelsfor thefork, arrow,andTand 16possiblela belsfor theell.Inthecatalog,however, only3/64, 3/64,4/64, and 6/16, respec tively,ofthe possible labelsactuallyoccur.Thusonlyasmallfraction ofpossible labelscanoccurinascene. The main observation that lets linelabelinganalysiswork isthe coherence rule:Inarealpolyhedralscene,nolinemaychangeitsinterpretation (label)between vertices. Forexample, whatiswrongwithsceneslikeFig.9.36 isthat theycannot becoherently labeled; lineschangetheirinterpretation within the impossibleob ject.Perhapsthelinesindrawingsofrealscenescanbeinterpretedquicklybecause the smallpercentage ofmeaningful labelingsinteracts withthecoherence ruleto reducedrasticallythenumberofexplanationsforthescene. HowdoeslinelabelingrelatetoGuzman? Alabeledline description clearly indicates the grouping of regions into bodies, and also rejects scenes like Fig. 9.33a, whichcannotbecoherently labeledwithlabelsfrom thecatalog.Theorigin ofGuzman's linkscanbeexplained thisway:consider again theworld ofconvex polyhedra; the only labels from the catalog that are possible are shown in Fig. 9.37a.Further, itisclearthataconvexedgehastwofacesofthesamebodyonei thersideofit,andanoccludingedgehasfacesfrom twodifferent bodiesoneither sideofit.Aconvexlabelmeanstheregionsoneither sideofitshould belinked; thisisGuzman'slinkplantingrule (Fig.9.37b).Theinhibition rulesareafurther corollaryofthelabels;theyaretosuppresslinksacrossanedgeifevidencethatit

Fig. 9.35 Different viewsof various corner types.


Sec. 9.5 Understanding Line Drawings

297

Table 9.1 VERTEX CATALOGUE


\ Visible \surfaces 3 Octants\. 2 1 0

filed
1

A A

m. yv V /is vv

Occlusion

Y" ^AA A A

mustbeoccludingissuppliedbythevertexatitsotherend (Fig.9.37c).Whenver tices at both ends of aline agree that the line is convex, Guzman would have planted twolinks;thisisinfact thestrongest evidencethattheregionsarepartof thesamebody.Ifjustonevertexgivesevidencethattheedgehasalink,adecision based on heuristics ismade; the coherence rule isbeing used implicitly byGuz man.Thesamephysicalandgeometricreality isdrivingboth hisschemeandthat of Huffman. Thelabelingschemeexplained herestillhasproblems:syntactically nonsen sicalscenesarecoherentlylabeled (Fig.9.38a);scenesaregivengeometricallyim possiblelabels (Fig.9.38b);andscenesthatcannotarisefrom polyhedraareeasily labelled (Fig.9.38c).Itisveryhardtoseehowalabelingschemecandetecttheil legalityofsceneslike(Fig.9.38c);theproblemisnotthattheedgesareincorrectly labeled,butthatthefacescannotbeplanar. Concernwiththislastmentionedproblemledtoaprogram (seethenextsec tion)thatcanobtaininformation aboutapolyhedralsceneequivalenttolabelingit,

Fig. 9.36 Animpossibleobject. Ch.9 Representations oi ThreeDimensionalStructures

( a )

s& >
X(K

x T T
7
< $

( b )

co ^

>y

*A

< *

Y
<

Fig. 9.37 The relation of links to labels, (a) Line labels, (b) Link planting ver tices. (c) Inhibitory links.

and alsocanreject nonpolyhedra asimpossible.There hasalso been an exciting denoumenttothelinelabelingidea [Waltz1975;Turner1974]. Waltzextendsthelinelabelstoincludeshadows,threeilluminationcodesfor each face on the side of an edge, and the separability of bodies in the scene at cracksandconcaveedges;thisbringsthenumberoflinelabelspossibleuptojust below100.Healsoextendsthepossiblevertextypes,sothatmanyverticesoffour linesoccur.HecandealwithscenessuchastheoneshowninFig.9.28c. Thecombinatorialconsequenceoftheseextensionsisclear;thepossiblever texlabelingsmultiplyenormously.ThefirstinterestingthingWaltzdiscoveredwas that despite the combinatorics, as more information iscoded into the lines, the smallerbecomesthepercentageofgeometricallymeaningful labelsforavertex.In hisfinalversion, onlyapproximately 0.03percent ofthepossiblearrowlabelscan occur,andforsomeverticesthepercentageisapproximately0.000001. ThesecondinterestingthingWaltzdidwastouseaconstraintpropagatingla belingalgorithmwhichveryquicklyeliminateslabelsforavertexthatisimpossible giventheneighboringverticesandthecoherencerule,whichplacesconstraintson labelings.Thesmallnumberofmeaningful labelsforaverteximposesseverecon straints on the labeling of neighboring vertices. Bythe coherence rule, the con straints maybepassedaround the scenefrom each vertex toitsneighbors;elim inatingalabelforavertexmayrenderneighboringlabelsillegalaswell,andsoon recursively.

(a)

( b )

(c)

Fig. 9.38 Nonsense labelingsand nonpolyhedra. 299

Sec. 9.5 Understanding Line Drawings

Waltzfound thatforscenesofmoderatecomplexity,eliminatingallimpossi blelabelingsleftonlyone,thecorrectone.Thelabelingprocess,whichmighthave beenexpectedtoinvolvemuchsearch,usuallyinvolvednone.Thisconstraintpro pagation is an example of parallel constraint satisfaction, and is discussed in Chapter 12inabroadercontext.Intheeventthatavertexisleftwithseverallabels after alljunction coherence constraints have been applied, they all participate in somelegallabeling.Atthispointonecanresorttotreesearchtofindtheexplicitla belings,oronecanapply more constraints. Manysuch constraints, heuristicand geometric, maybe imagined. For instance, aconstraint could involve color edge profiles.Iftwoalignededgesareseparatedbysome(possiblyoccluding) structure, butstilldividefacesofthe samecolor, theyshould havethesamelabel.Another importantconstraintconcernshowfaceplanarityconstrainslineorientations. Sceneswithmissinglinesmaybelabeled;onemerelyaddstothelegalvertex catalogtheverticesthatresultiflinesaremissingfrom legalvertices.Thisideahas the drawbacks of increasing the vertex catalog and widening the notion of con sistency,butcanbeuseful. Anotherextension tolinelabelingisthatof [Kanade 1978]. Thisextension considers not only solid polyhedra but objects (including nonclosed "shells") madeupofplanarfaces.Thisextension hasbeencalled origamiworldafter theart of making objects from folded (mostly planar) paper. An example from origami worldistheboxinFig.9.39a.Aquickcheckshowsthatthiscannotbelabeledwith the HuffmanClowes labelset.It canbelabeled usingtheorigamiworldlabelset (Table9.2)anditsinterpretationisshowninFig.9.39b. Table9.2
EXPANDEDJUNCTION TABLE ELL FORK

+y+ v

ry

ARROW

< i

+ + +

T T
+ 300 Ch.9 Representations of ThreeDimensionalStructures

<P1
Fig. 9.39 (a)Box.(b)Labelededgesaccordingtoorigamiworldlabelsel.

The vertex labelsmay beextended toinclude sceneswithcylinders,cones, spheres,tori,andother simplecurves.Inexpandeddomainsthe notionof "legal linedrawing" becomes very imprecise.Inanyevent the number ofvertex types andlabelsgrowexplosively,andthecoherencerulemustbemodified tocopewith thefact thatlinescanchangetheirinterpretation betweenverticesandcantailoff intonothing,andthatoneregioncanattainallthreeofWaltz'silluminationtypes [Turner 1974,Chakravarty 1979]. ThedomainisofscenessuchasappearinFig. 9.28d.
9.5.4 ReasoningAboutPlanes

Thedeficiencies inthescenelinelabelingalgorithmsprompted aconsiderationof the geometrical foundations of the junction labels [Mackworth 1973, Sugihara 1981].Thiswork seekstoanswer the samesortsofquestions asdolabelingpro grams,butalsototakeaccountofobjectsthatcannotpossiblybeplanarpolyhedra, suchasthose ofFig.9.40.Neither approach usesacatalogofjunction labels,but reliesinstead onideasofgeometriccoherence.The basisisaplaneoriented for mulationratherthanalineorientedone. GradientSpace Mackworth'sprogramreliesheavilyontherelationofpolyhedralsurfacegra dients to the lines in the image (recall section 3.5.2). Image information from orthographicprojectionsofplanarpolyhedralscenesmayberelatedtogradientin formation inauseful way.AnimagelineListheprojection ofathreespacelineM arisingfrom theintersection oftwofaceslyingindistinctplanesUl andn 2 ofgra dients (p\,q\) and ipi^q^). With the (p,q) coordinate system superimposed on theimage (x,y) coordinatesystem, thereisthefollowingconstraint.Theorienta tionofLconstrainsthegradientsofIIjandII2;specifically, thelineLisperpendic ulartothelineGbetween(p\,q\)and (p2, q2) (Fig.9.41).

^ &>a
Fig. 9.40 Labelablebutnotplanarpolyhedra.

Sec.9.5 Understanding LineDrawings

301

y,q

Is

lip,

/ / / / /

^ ^ s .

Ky

. 9 1 )

X,P

( P 2 , <72>

Fig. 9.41 Gradientspaceconstraint.

The result iseasily shown. With orthographicprojection, the origin maybe moved oftheimageplanetobeinLwithout lossofgenerality.Then Lisdefined by its direction vector (X, /x) = (cos#, sin0). The threespace point on Ox corresponding to (0,0) may be expressed as (0,0,k\), and at (X, fx) the correspondingpointis(k,fx,kp] + M^I + k\). ThusmovingalongM(which isin nj) from (x,y) = (0,0) to (x,y) (X,/x) movesalongzbyXp, + fxqx. The coordinatesofaunitvectoronLcanthen beexpressedas(X,/x,\pl +ixq\). But Lisalsoinn 2 ,andthisargumentmayberepeatedforJl2,usingp2andg2.Thus
X/7, + fiq\ " ^Pi + VQ2

(9.23) (9.24)

or
(X,/i)'(pi Pu Qi Q\) = 0

Equation (9.24) is adot product set equal to zero, showing that its two vector operandsareorthogonal,whichwastobeshown. Everypicturelineresultsfrom theintersection oftwoplanes,andsoithasa lineassociated withitingradientspacewhich isperpendicular toit.Furthermore, ifthegradientsofthesurfacesareonthesamesideofthepicturelineastheirsur faces, theedgewasconvex; ifthegradientsareonoppositesidesoftheline from theircausingsurfaces, theedgewasconcave (Fig.9.42).Foreveryjunction inthe imagetherearejusttwowaysthegradientscanbearrangedtosatisfytheperpendi cularity requirement (Fig.9.43).Inthefirst,alledgesareconvex, inthe second, concave.Switching interpretations from onetothe other bynegating gradientsis thepsychological"Neckerreversal." Noticethatifanimagejunctionisathreespacepolyhedralvertex,eachedge ofthevertex isthe intersection oftwofaceplanes.Ifthecorresponding gradients areconnected,a"dual" (p,q) spacerepresentation ofthe (x,y) spacejunctionis formed. Theconnected (p,q)gradientpointsformapolygonwhoseedgesareper pendicular tothejunction linesin (x,y) space.Thepolygonislargerifthethree
3 0 2 Ch. 9 Representations oi ThreeDimensional Structures

+G*

<? .

Gn =G*

G9

G,

(b)

(c)

Fig. 9.42 Relation of gradients, image and world structures, (a) Image, (b) World,(c)Gradients.

dimensionalcornerissharper, andshrinkstowardthejunction pointasthecorner getsblunter. InterpretingDrawings Itispossibletousethesegeometricresultstointerpret thelinesinorthogo nallyprojected polyhedral scenesasbeing "connect" (i.e.,asbeingbetweentwo connectedfaces) oroccluding.Itcanalsobedetermined ifconnect edgesarecon vexorconcave,andforoccludingedgeswhichsurface isinfront. Hiddenpartsof the scene maysometimes be reconstructed. The orientation ofeach surface and edgeinthescenemaybefound.Thusaprogramcandeterminethatinputsuchas Fig.9.40isnotaplanarfaced polyhedron [Mackworth 1973].Sugihara'sworkgen eralizes Mackworth's; itdoesnot usegradient spaceand doesnotrely on ortho graphicprojection.
Sec. 9.5 Understanding Line Drawings 3 0 3

Fig. 9.43 Ascenejunctionandtworesultingtrianglesingradientspace.

Mackworth's procedure to establish connect edges produces the most con nected interpretation first (a nonconnected interpretation isjust acollection of floating faceswhichlineupbyaccident togivethelinedrawing).The background regionisthefirsttobeinterpreted; thatis,meanstohaveitsgradientfixedingra dientspace.After aregionisinterpreted, theregionhavingthemostlinesincom monwithregionssofarinterpreted isinterpretednext. The imageofascene isgiven in Fig.9.44a; it isinterpreted asfollows. No coherentinterpretationispossiblewithfiveorfourconnectedges.Tryingfor three connect edges, the program interprets A byarbitrarily picking agradient for the surface A represents (the background). It picks the origin of gradient space. In ordertobeabletoreasonaboutlinesintheimage,itneedstohavean interpreted region oneither sideoftheline,soitmustinterpret another region.ItpicksB(C wouldbeasgood). ThelinesboundingBareexamined toseeiftheyareconnect.Line1iscon sidered. If it isconnect, the gradient space dual of it will be perpendicular to it through the gradient space point representing surface A (i.e., the origin). Now anotherarbitrarychoice:Thegradient corresponding tosurface Bisplacedatunit distance from the origin, thus "imagining" the second gradient inarow. From nowon,thegradientsaremorestronglylocated.Thearbitraryscalingandpointof originimposedbythesefirsttwochoicescanbechangedlaterifthatisimportant. In gradient space, the situation isnow shown in Fig. 9.44b. Now consider line2;toestablish itasaconnect edge, GB = (pB, lB) (thegradient spacepoint corresponding tothesurface B)must lieonalineperpendicular to2through GA (Fig. 9.44c). Thiscannot happen; the situation with 1and 2both connect isin coherent. Thus,withaline 1connect edge,2must beoccluding.Thissort ofin coherency result waswhat kept the program fromfindingfour orfiveedgescon nect. Further interpretation involves assigning gradients and vertices into the developing diagram in a noncontradictory, maximally connected manner (Fig. 9.44d). Thenextpartoftheprogramdeterminesconvexityorconcavityofthelines. Thefinalpartofthe program looksat occlusion. Italso suggests hidden surfaces
3 0 4 Ch. 9 Representations of ThreeDimensional Structures

Fig. 9.44 (a) Polyhedral scene considered by Mackworth. (b) Partial interpretation. (c) Continued interpretation, (d) Occluding and connect interpretations, (e) Final interpre tation.

andthushiddenlinesthatareconsistentwiththeinterpretation (Fig.9.44e). This figureingradient spaceresembles atetrahedron, aswellit might; itisformed in the samewayasthegraphtheoretic dual (pointper face, edgeperedge,face per point) whichdefines dualgraphsanddualpolyhedra;thetetrahedron isselfdual. Thearbitrarychoicesofgradientreflect degreesoffreedom inthedrawingthatare alsoidentified bySugihara.
Sec. 9.5 Understanding Line Drawings 305

SkewedSymmetry Many planar objects are symmetrical about an axis.This axis and another, which isperpendicular to thefirstand in the plane ofthe object, form anatural orthogonalcoordinatesystemfortheobject.Iftheplaneoftheobjectisperpendic ular to the lineofsight from the viewpoint, the coordinate axes appear to beat rightangles.Iftheobjectistiltedfrom thisposition,theaxesappearskewed.Some examplesareshowninFig.9.45. Askewedsymmetry mayormaynotreflect arealsymmetry;theobjectmay itselfbeskewed.However,iftheskewedsymmetryresultsfrom atiltedrealsym metry,aconstraintingradientspacemaybedevelopedfortheobject's orientation [Kanadel979]. Animagedunitvectorinclinedata inscribedonaplaneatorientation (p,q) musthavethreedimensionalcoordinatesgivenby (cosa, sina, pcosa + ^sina) Thusifthetwoaxesofskewedsymmetrymakeanglesofa andBwiththeimagex axis,thetwovectorsinthreespaceaandbmusthavecoordinates a= (cosa, sina, pcosa + qsina) and b= (cos/3, sin/3, pcos/3Iqsin/3) Sincethesevectorsreflectarealsymmetry,theymustbeperpendicular (i.e., ab = 0),or cos(a B) + (pcosa +qsine*) (pcos/3+ qsin/3) = 0 ByrotatingthepandqaxesbyX= (a f/3)/2,thatis p' = p cosX + q sin X q'= sinX + q cosX p (9.25)

(a)

(b)

rU
90

f.
(0

(d)

Fig. 9.45 Skewedsymmetries. (a,b,c) areexamples,(d)Eachskewed symmetrydefines twoaxes.


Structures

3 0 6

Ch. 9 Representations of ThreeDimensional

Equation (9.25) canbeputintothe form p'2 cos2 q sin z = cos (y)

where y = a(3. Thus thegradient ofthe object must lieon ahyperbola with axis tilted Afrom the xaxis, and with asymptotes perpendicular to the directions of a and/S.Thisconstraint isshown inFig.9.46. To show how skewed symmetry can be exploited to interpret objects with planar faces, reconsider the example of Fig. 9.43. In that example the three con vexedgesconstrained thegradientsofthecorresponding faces tobeatthe vertices ofatriangle,butthesizeorpositionofthetriangle ingradient spacewas unknown. However, skewed symmetry applied toeach face introduces three hyperbola upon which the gradients must lie. The only way that both the skewed symmetry con straint and triangle constraint can be satisfied simultaneously is shown in Fig. 9.47the combinedconstraintshaveuniquely determined theface orientations. EXERCISES 9.1 Deriveanexpression for thevolumeofanobjectrepresented byspherical harmonics oforderM= 1. 9.2 Deriveanexpression for theperpendicular tothesurface ofanobject representedby sphericalharmonicsintermsoftheappropriatederivatives. 9.3 Derive an expression for the anglecentroid ofeach ofthe spherical harmonic func tionsforM ^ 2. 9.4 LabelthelinesintheobjectsofFig.9.48.

Fig. 9.46 Skewed symmetry constraint ingradient space.


Exercises

307

(a)

(d)

(0

Fig. 9.47 Usingskewed symmetry to orient the faces ofacube, (a)Thecube. (b)Skewedsymmetries, (c)skewedsymmetriesandjunctionconstraint plottedin gradientspace,(d)another possibleobjectobeyingtheconstraints.

9.5 GivetwosetsofCSGprimitiveswithsamedomain. 9.6 Show that the dualofthe planeofinterpretation for a lineand theduals ofthetwo planesthatmeetintheedgecausingthelineareallonthedualoftheedge. 9.7 Prove(Section9.3.1) thatintheFrenetframe'isperpendicularto. 9.8 Write the precise rules for combining classification results for \J *, (~) *, and operations. 9.9 Find two interpretations of the tetrahedron ofFig.9.44a that differ in convexity or concavityoflines.(Hint:Theconcaveinterpretation hasanaccidentofalignment.)
308 Ch. 9 Representations of ThreeDimensional Structures

K=^=
</

Fig. 9.48 Objects for labeling. R E F E R E N C E S AGIN, G.J. "Representation and description ofcurved objects" (Ph.D. dissertation). AIM173,Stan ford AILab,October1972. BADLER, N.I.and R.K. BAJCSY. "Threedimensional representations forcomputer graphics and com puter vision." ComputerGraphics12,August 1978,153160. BADLER, N. I. and J. O'ROURKE. "Representation of articulable, quasirigid, threedimensional ob jects." NSF Workshop on the Representation of ThreeDimensional Objects, Univ. Pennsyl vania,May1979. BARNHILL, R. E. "Representation and approximation ofsurfaces." In Mathematical Software III, J. R. Rice (Ed.).NewYork:Academic Press,1977. BARNHILL, R. E.and R. F. RIESENFELD. Computer AidedGeometricDesign.New York:Academic Press, 1974. BAUMGART, B.G. "Winged edge polyhedron representation." STANCS320, AIM179, Stanford AI Lab, October1972. BENTLEY, J. L.Multidimensional search trees used for associative searching, Comm. ACM18, 9,Sept. 1975,509517. BINFORD, T. 0 . "Visual perception bycomputer." IEEE Conf. on Systems and Control, Miami, De cember1971.

References

309

BOYSE,J.W."Data structure forasolid modeller," NSFWorkshop on theRepresentation ofThree Dimensional Objects, Univ. Pennsylvania, May 1979. BROWN,C.M."Twodescriptionsandatwosample testfor3dvector data." TR49,Computer Science Dept., Univ.Rochester, February 1979a. BROWN, C.M."Fast display ofwelltesselated surfaces.''1 Computers andGraphics 4,2, September 1979b,7785.
BROWN C. M., A. A. G. REQUICHA,and H. B.VOELCKER. "Geometric modelling systems for mechani

caldesign and manufacturing." Proc, 1978Annual Conference oftheACM, Washington,DC, December 1978,770778. CHAKRAVARTY, I. " A generalized line andjunction labelling scheme with applications to scene analysis," IEEETrans.PAMI, April 1979,202205. CLINTON, J.D."Advanced structural geometry studies, Part I:Polyhedral subdivision conceptsfor structural applications."NASA CR1734/35,September 1971. CLOWES, M.B."Onseeingthings."ArtificialIntelligence2, 1,Spring 1971,79116. COONS,S.A."Surface patchesandBspline curves."InComputerAidedGeometricDesign,R.E.Barnhill and R.F.Riesenfeld (Eds.). (Proc, Conference onComputer Aided Geometric Design, Univ. Utah, March 1974.)NewYork:Academic Press,1974.
EJIRI, M., T. U N O , H. YODA, T. GOTO, and K. TAKEYASU. " A n intelligent robot with cognition and

decisionmaking ability." Proc, 2ndIJCAI, September 1971,350358. FALK, G."Interpretation ofimportant line dataasathreedimensional scene." ArtificialIntelligence3, 1, Spring 1972,77100. FORREST, A.R."On cones andother methods fortherepresentation ofcurved surfaces." CGIP I,4, December 1972,341359. GUZMAN, A."Decomposition ofa visualscene into threedimensional bodies" (Ph.D.dissertation).In Automatic Interpretation and Classification ofImages, A.Grasseli (Ed.). New York: Academic Press,1969. HUFFMAN, D.A."Impossible objects asnonsensesentences." InMI6, 1971. JACKINS, C.L.,andS.L.TANIMOTO. Octtrees andtheir usein representing threedimensional objects, CGIP 14, 3,Nov. 1980,249270. KANADE, T." A theory ofOrigami world." CMUCS78144, Computer Science Dept., Carnegie Mellon Univ.,1978. KANADE, T."Recovery ofthe threedimensional shape ofan object from asingle view." CMUCS 79153,Computer ScienceDept.,CarnegieMellon Univ., October1979. LAKATOS, I.ProofsandRefutations. Cambridge, MA:Cambridge University Press,1976. LEE, Y.T.andA.A.G. REQUICHA. "Algorithms forcomputing thevolume andother integral proper ties ofsolid objects." Tech. Memo 35,Production Automation Project, Univ. Rochester, Ro chester NY,Feb. 1980. MACKWORTII, A.K."Interpreting picturesofpolyhedral scenes." ArtificialIntelligence4,2, June1973, 121137. MACKWORTH,A.K."Onreadingsketch maps." Proc, 5thIJCAI,August 1977,598606. MARKOWSKY,G.andM.A.WESLEY. "Fleshing outwire frames." IBMJ. Res.Devel.24,1(Jan. 1980) 6474. MARR, D.andH.K.NISHIHARA. "Representation andrecognition ofthe spatial organization ofthree dimensionalshapes." Proc, RoyalSocietyofLondonB200, 1978,269294. NISHIHARA, H. K."Intensity, visible surface andvolumetric representations." NSFWorkshop on the Representation ofThreeDimensional Objects, U.Pennsylvania, May 1979. O'NEILL, B.ElementaryDifferentialGeometry.NewYork:Academic Press,1966.

3 1 0

Ch. 9 Representations of ThreeDimensional

Structures

O'ROURKE, J. and N. I. BADLER. "Decomposition of threedimensional objects into spheres.'"IEEE Trans.PAMII,July1979. REQUICHA,A.A.G."Representationsofrigidsolidobjects."ComputerSurveys12,4,December1980. ROBERTS,L.G."Machine perception ofthreedimensional solids."In Opticaland ElectroopticalInfor mationProcessing,J.P.Tippettetal. (Eds.).Cambridge,MA:MITPress,1965. SCHUDY, R. B.and D. H. BALLARD. "Modeldetection ofcardiac chambers in ultrasound images." TR12, ComputerScienceDept.,Univ.Rochester,November1978. SCHUDY,R.B.andD.H.BALLARD."Towardsananatomicalmodelofheart motionasseenin4dcar diacultrasound data."Proc,6thConf.onComputer ApplicationsinRadiology andComputer AidedAnalysisofRadiologicalImages,June1979. SHANI, U. "A 3d modeldriven system for therecognition ofabdominal anatomy from CTscans." TR77,Computer ScienceDept., U.Rochester, May1980;alsoinProc.5thIJCPR,Miami,De cember 1980,585591. SHAPIRA, R. "A technique for the reconstruction ofastraightedge, wireframe object from twoor morecentralprojections."CGIP3,4, December 1974,318326. SHIRAI,Y."Analyzingintensityarraysusingknowledgeaboutscenes."InPCV, 1975. SOROKA,B.I."Generalised cylindersfromparallelslices."Proc,PRIP,1979a,421426. SOROKA,B.I."Understanding objectsfrom slices."Ph.D. dissertation, Dept.ofComputer and Infor mationScience,Univ.Pennsylvania, 1979b. SOROKA, B.I.andR.K.BAJCSY."Generalized cylindersfrom serialsections."Proc,3rdIJCPR,No vember1976,734735. SUGIHARA,K."Mathematicalstructuresoflinedrawingsofpolyhedra,"RNS8102,Dept.ofInfo.Sci ence,NagoyaUniv.,May1981. TILOVE,R.B."Setmembershipclassification:aunified approach togeometricintersection problems." IEEETrans. Computers29, 10,October1980. TURNER, K.J."Computer perception ofcurvedobjectsusingatelevisioncamera."Ph.D.dissertation, Univ.Edinburgh,1974. VOELCKER, H.B.andA.A.G.REQUICHA,Boundaryevaluationproceduresforobjectsdefined viacon structivesolid geometry, Tech. Memo 26,Production Automation Project, Univ. Rochester, 1981. VOELCKER, H. B.and A.A.G. REQUICHA. "Geometricmodelingofmechanical partsand processes." Computer10,December 1977,4857. VOELCKER, H.B.and Staff ofProduction Automation Project, "The PADL1.0/2 systemfor defining anddisplayingsolidobjects."ComputerGraphics 12,3,August 1978,257263. WALTZ,D.I."Generatingsemanticdescriptionsfromdrawingsofsceneswithshadows."Ph.D.disser tation,AILab,MIT,1972;alsoinPCV, 1975. WARNOCK, J.G. "A hiddensurface algorithm for computergenerated halftone pictures."TR415, ComputerScienceDept.,Univ.Utah,June1969. WESLEY,M.A.andS.MARKOWSKY."Fleshingoutprojections."IBMJ.Res.Devel. 25, 6(Nov.1981), 934954.

References

311

RELATIONAL STRUCTURES

IV

Visualunderstandingrelatesinputanditsimplicitstructuretoexplicitstructurethat alreadyexistsinourinternalrepresentationsoftheworld.Morespecifically, vision operationsmust maintain andupdate beliefsabout theworld,andachieve specific goals. To consider how higher processes can influence and use vision, one must confront the nonvisual world and powers of reasoning that have more general applicability. The world models that are capable of supporting advanced applicationdependent calculations about objects in the visual domain are quite complex..General techniquesofknowledgerepresentationdeveloped inotherfields ofartificial intelligencecanbebrought tobearonthem.Similarly, much research has been invested in the basic processes of inference and planning. These tech niquesmaybeusedinthe visualdomaintomanipulatebeliefsandachievegoals, aswellasreasoningforotherpurposes. Theorganizationofacomplexvisualsystem (Fig.1.5orFig.10.1),isaloose hierarchyofmodelsofworldphenomena. Therelationalmodelsthatconcernusin thischapterareremovedfrom directperceptualexperiencethey areusedmainly for thelast, highestlevel stagesofperception. Also,they areusedfor knowledge attained prior tothevisualexperiencecurrently beingprocessed.The representa tionsinvolved maybeanalogicalorpropositional. Analogicalrepresentationsallow simulationsofimportantphysicalandgeometricpropertiesofobjects. Propositions are assertions that are either true or false with respect to the world (or aworld model). Each form is useful for different purposes, and one is not necessarily "higher"thantheother.ThetechniquesandrepresentationsofPartIVaremainly propositional in flavor. Sometimes the reasoning they implement (say about geometricalentities) wouldseembettersuitedtoanalogicalcalculations;however, technicaldifficultiescanrenderthatimpossible. Part IV is concerned with techniques for making the "motivation" and "worldview"ofavisionsystemexplicitandavailable.Suchexplicitmodelswould
3 1 4 Part IV Relational Structures

beinterestingfromascientificstandpointeveniftheywerenotdirectlyuseful.But explicitly available models are decidedly useful. They are useful to the system designerwhodesirestoreconfigure orextendasystem.Theyareuseful tothesys temitself,whichcanusethem toreasonaboutitsownactions,flexiblycontrolits own resources in accordance with higher goals, dynamically change its goals, recoverfrommistakes,andsoforth. WeorganizethemajortopicsofPartIVasfollows. 1. Knowledgerepresentation (Chapter 10).Semanticnetsareanimportanttech nique for structuring complex knowledge, and can be used as a knowledge representationformalismintheirownright. 2. Matching (Chapter 11).Matchingputsaderived representation ofan image intocorrespondence with anexisting representation. Thisstyleofprocessing representations is more pronounced as domaindependent knowledge, idiosyncratic goals, and experience begin to dominate the ultimate use (or understanding) ofthevisualinput. 3. Inference (Chapter 12).Classical logicalinference (atechniquefor manipulat ingpurelypropositional knowledgerepresentations) isawellunderstood and elegant reasoning technique. It has good formal properties, but occasionally seems restricted in its power to duplicate the range of human processing. Extendedinferencetechniquessuchasproductionsystemsarethoseinwhichthe inference processaswellasthepropositionsmaycontributemateriallytothe derived knowledge. Labelingtechniques can "infer" consistent or likely interpretations foraninputfrom givenrulesaboutthedomain.Inferencecan beusedforbothproblemsolvingandbeliefmaintenanceactivity. 4. Planning (Chapter 13). Planningtechniques are useful for problem solving, andareespeciallytailoredtointegratingvisionwithrealworldaction.Planning canbeusedforresourceallocationandattentionalmechanisms. 5. Control (Chapter 10;Appendix 2).Control strategiesand mechanismsareof vitalconcerninanycomplexartificial intelligencesystem,andareparticularly importantwhenthecomputationisasexpensiveasthatofvisionprocessing. Learningismissingfrom thelistabove.Disappointingasitis,atthiswriting the problem of learning is so difficult that wecan say very little about it in the domainofvision.

Part IV Relational Structures

315

Knowledge Representation andUse

10

10.1 REPRESENTATFONS

An internal representation of the world can help an intelligent system plan its actionsandforesee theirconsequences,anticipatedangers,anduseknowledgeac quiredinthepast.InPartIVweinvestigatethecreation,maintenance,anduseofa knowledgebase,anabstractrepresentation oftheworlduseful forcomputervision. Chapter 1 introducedalayeredorganizationfortheknowledgebaseanddividedits contents into "analogical" and "propositional" models. In this section we con siderthishighleveldivisionmoredeeply. Theoutsideworldisaccessibletoacomputervisionprogramthroughtheim agingprocess.Otherwise,theprogramismanipulatingitsinternalrepresentations, which should correspond to the world in understood ways. In this sense, the knowledgebaseofgeneralized images,segmented images,andgeometric entities contains "models" ofthe phenomena intheworld.Another moreabstract sense of "model" is highlevel, prior expectations about how the world fits together. Such a highlevel model is often much more complex than the lowerlevel representations, often hasalarge"propositional" component, andisoften mani pulated by"inferencelike" procedures.Explicitknowledgeand belief structures arearelatively newphenomenon incomputer vision, butareplayinganincreas inglyimportantrole. Thegoalsofthischapterarethree. 1. Todevelopinmoredepthsomeissuesofhighlevelmodels(Section10.1). 2. Todescribe semanticnetsanimportant andgeneral toolfor both organizing andrepresentingmodels(Sections10.2and10.3). 3. Toaddressissuesofcontrol,atbothabstractandimplementationallevels(Sec tion 10.4augmentedbyAppendix2).
317

10.1.1 TheKnowledge BaseModelsandProcesses

Figure 10.1shows therepresentational layersintheknowledge baseaswehave developed itthrough thebook,andshowstheplaceofimportant processes.This organizationmightbecomparedwiththatin[BarrowandTenenbaum1981]. Theknowledgebaseorganizationismirroredintheorganizationofthebook. PartsItoIIIdealt with analogical modelsandtheir construction; PartIViscon cerned with propositional andcomplex analogical models.InChapters 11to 13, the emphasis moves from thestructure ofmodels totheprocesses (matching, inference,andplanning)neededtomanipulateandusethem. Theknowledgebaseshouldhavethefollowingproperties. Representanalogical,propositional,andproceduralstructures Allowquickaccesstoinformation Beeasilyandgracefully extensible Supportinquiriestotheanalogicalstructures Associateandconvertbetweenstructures Supportbeliefmaintenance,inference,andplanning
Generalized : mage Image j ~ model I i_ Intrinsic image

Construction

l >
Boundaries Analogical models (Iconic, geometric, procedural) | Regions Texture Motion Two I dimensional Three dimensional Semanticnets Analogical and propositional models Relational structures Propositions I and I hypotheses
L

Segmented image

Knowledge base

Geometric represen tations

f>Matching
I ; and I prediction l J _ _ ^ v Matching

L"^>Inference

Plans

<

^>Planning

Fig. 10.1 The knowledge base and associated processes in a computer vision system.

3 1 8

Ch. 10 Knowledge Representation and Use

The highest levels ofthe knowledge base contain both analogicalandprop ositionalmodels. Analogical toolsdonotexistfor manyimportant activities,and whentheydoexisttheyareoften computationallyintensive.Athreedimensional geometric modeling system for automatic manufacturing has very complex data structures and algorithms compared to their elegant and terse counterparts ina propositionalmodelthatmaybeusedtoplanthehighestlevelactions.Ingeneralit makessensetodosomecomputationattheanalogicallevelandsomeatthepropo sitional.Thismultiplerepresentation strategyseemsmoreefficient than translat ingallproblemsintoonerepresentationortheother. Thecomputationsinavisionsystemshouldbeorganizedsothat information canflowefficiently andunnecessarycomputationiskepttoaminimum.Thisisthe function ofthe controldisciplines that allocateeffort to different processes. Even thesimplestbiologicalvisionsystemsexhibitsophisticatedcontrolofprocessing. Constructiveprocessesdominate theactivityinbuilding lowerlevel models, and matchingprocessesbecomemoreimportantaspriorexpectationsand models arebroughtintoplay.Chapter11isdevotedtotheprocessofmatching. Wepostulatethatanadvancedvisionsystemisengagedintwosortsofhigh levelactivity:beliefmaintenanceandgoalachievement.Theformer isamoreorless passive, datadriven, background activity that keeps beliefs consistent and up dated.The latter isanactive,knowledgedriven, foreground activitythatconsists ofplanningfuture activities.Planningisaproblemsolvingandsimulationactivity thatanticipates future world states; incomputer vision itcandetermine how the visual environment isexpected to change ifcertain actionsare performed. Plan ning canoccurwithsymbolic,propositional representations (Chapter 13) orina more analogical vein with suchsimulationsastrajectory planning [LozanoPerez andWesley 1979].Planning isuseful asanimplementational mechanism evenin contextsthat arenotanalogous to human "conscious" problem solving [Garvey 1976].Helmholtzlikenedtheresultsofperceptionto "unconsciousconclusions" [Helmholtz1925]. Similarlyeven"primitive"visionprocesses (computerorbio logical)mayuseplanningtechniquestoaccomplishtheirends. Inference and planning are both classical subfields of artificial intelligence. Neitherhasseenmuchapplication incomputervision.Inference seemsuseful for belief maintenance. Extended inference can deal with inconsistent beliefs and with beliefs that are maintained with various strengths. We treat inference in Chapter 12.Applications of planning tovision [Garvey 1976;Bolles 1977]show goodpromise.PlanningistreatedinChapter13.
10.1.2 AnalogicalandPropositional Representations

Ourdivisionoftheinternalknowledgebaseinto"analogical"and "propositional" reflects asimilar division in theories of how human beings represent the world [JohnsonLaird 1980]. Psychological data arenot compelling toward either pure theory; thereareindicationsthathuman beingsusebothformsofrepresentation. Weintroducethedivisioninthisbookbecausewefinditconceptuallyusefulinthe

Sec. 10.1 Representations

319

following way.Lowlevel representations andprocessestendtobepurelyanalogi cal;highlevelrepresentationsandprocessestendtobebothanalogicalandpropo sitional. Analogical representations have the following characteristics [Kosslyn and Pomerantz 1977;Shepard 1978;Sloman 1971;KosslynandSchwartz 1977,1978; WaltzandBoggess1979]. 1. Coherence. Eachelement ofarepresented situation appearsonce, withallits relationstootherelementsaccessible. 2. Continuity. Analogous with continuity of motion and time in the physical world;theserepresentationspermitcontinuouschange. 3. Analogy.Thestructureoftherepresentation mirrors (andmaybeisomorphic to)therelationalstructureoftherepresentedsituation.Therepresentationisa descriptionofthesituation. 4. Simulation. Analogical modelsareinterrogatedand manipulated byarbitrarily complex computational procedures that often have the flavor of (physical or geometric) simulation. Propositional representations have the following characteristics [Anderson andBower1973;Palmer 1975;Pylyshyn19731. 1. Dispersion. An element ofarepresented situation canappear in severalprop ositions.However, thepropositionscanberepresented inacoherent manner byusingsemanticnets. 2. Discreteness.Propositionsarenotusuallyusedtorepresentcontinuouschange. However, they may be made to approximate continuous values arbitrarily closely. Smallchangesinthe representation canthus bemadetocorrespond tosmallchangesintherepresentedsituation. 3. Abstraction. Propositions are true or false. They do not have a geometric resemblancetothesituation;theirstructureisnotanalogoustothatofthesi tuation. 4. Inference. Propositional modelsaremanipulatedbymoreorlessuniform com putationsthatimplement"rulesofinference" allowingnewpropositionstobe developedfrom oldones. Eachsortofmodelderivesits"meaning"differently; thedistinctionsarein teresting, because they can point out weaknesses ineach theory [JohnsonLaird 1980;Schank 1975;Fodor, etal. 1975].Especially incomputer implementations, thetworepresentationsonlydiffer essentially inthelasttwopoints.Itisoftenpos sibletotransform onerepresentationtoanotherwithoutlossof information. Some examples are in order. Ageneralized image (Part I) isan analogical model:tofindanobjectaboveagivenobject,aprocedurecan"searchupward"in the image. An unambiguous threedimensional model of a solid (Chapter 9) is analogical. It may be used to calculate many geometric properties of the solid, even those unimagined bythe designer of the representation. Aset of predicate calculusclauses (Chapter 12)isapropositional model.Closelyrelated modelscan beusedtosolveproblemsandmakeplans[Nilsson 1971,1980;Chapter13].
320 Ch.10 KnowledgeRepresentation and Use

Ashort digression:It isinteresting thatpeopledonotseem toperform syl logisticinference (formal propositional deduction) ina"mechanical"way.Given two clauses such as "Some appliances are telephones" and "All telephones are black," weare much more likely to conclude "Some appliances are black" than the equally valid "Some black things are appliances." There is not a satisfying theory of the mental processes underlying syllogistic inference. An interesting speculation [JohnsonLaird 1980]isthatinference isprimarilydonethroughana logical mental models (inwhich, for example,apopulation ofindividualsiscon juredupandmanipulated).Thensyllogisticinference techniquesmayhavearisen asabookkeeping mechanism to assure that analogical reasoning does not "miss anycases."
10.1.3 Procedural Knowledge

Procedures as explicit elements in amodel pose problems because they are not readily "understood" byother knowledgebasecomponents.Itisveryhardtotell whataproceduredoesbylookingatitscode. In our taxonomy wethink of "procedural" knowledge asbeing analogical. Thesequentialnatureofaprogram'sstepsisanalogoustoanorderingofactionsin timethatcanonlybeclumsilyexpressed incurrentpropositional representations. Knowledge about "howto" perform a complex activity is most propitiously representedintheformofexplicitprocessdescriptions.Descriptionsnotinvolving theelementoftimemaybenaturallyrepresentedaspassive(analogicalorproposi tional)structures. There have been several attempts to organize chunks of procedural knowledge byassociatingwith the procedureadescription ofwhatitistoaccom plish. For example, procedural knowledge can be stored in the internal model structure (knowledge base) indexed under patterns that correspond to the argu mentsoftheprocedure.Patterndirectedinvocationinvolvesgoingtotheknowledge baseforaprocedurethatmatchesthegivenpattern,matchingpatternelementsto bind arguments, and invoking the procedure. Several advantages accrue in patterndirected invocation, such asnot having to know the "proper names"of procedures, only their descriptions (what they claim to do). Also, when several proceduresmatchapattern,oneeithergetsnondeterminismorachancetochoose thebest.Often systemfacilitiesincludeaproceduretoruntochoosethebestpro cedure dynamically. Similar pattern matching is involved in resolution theorem proversandproductionsystems (Chapter12). As an example, in a program to locate ribs in a chest radiograph [Ballard 1978],procedurestofindribsunderdifferent circumstancesareattachedtonodes inamixed analogicand propositional modeloftheribcageasshown inFig.10.2. Eachprocedurehasanassociated description whichdetermineswhether itcanbe run. Forexample, some programs require instances ofneighboring ribstobelo catedbefore theycanrun,whereasotherscanrungiven onlyrudimentary scaling information. When invoked, each procedure tries tofindageometric structure correspondingtotheassociatedribinaradiograph. Insteadofsearchingforribsin amechanicalorder,descriptorsallowachoiceoforderandproceduresandhencea
Sec.10.1 Representations

321

Fig. 10.2 Aportionofaribcagemodel (seetext). Proceduralattachment toa modelisdenotedbyjaggedlines. m o r e flexible, efficient and robust program (Appendix 2 ) .

The representation and use of procedural knowledge isan important topic [SchankandAbelson 1977;Winograd 1975;Freuder 1975].Weexpectittobein creasinglyimportantforcomputervision.
10.1.4 Computer Implementations

Acomputerimplementation can (andoften does) obscurethesharpdivisionsim posed by pure philosophical differences between analogical and propositional models. Apropositional representation need not beanunordered setofclauses, but may have acoherent structure; the coherent versus dispersed distinction is thusblurred.Ageometrytheorem prover orablockstacking program maymani pulatediagramsorsimulatephysicalphenomenasuchasgravitationalstabilityand wobble in the manipulator [Gelernter 1963;Fahlman 1974; Funt 1977]. "Non standard inference" is an important tool that extends classical inference tech niques. Although techniques such asproduction systems and relaxation labeling algorithms (Chapter 11) bearlittlesuperficial resemblance topredicatelogic,both maybenaturallyusedtomanipulatepropositionalmodels. Propositions may be implemented as procedures. If a proposition "evalu ates" to true or false, it is perhaps most naturally considered afunction from a world (orworld model) toatruth value.Thisisnot tosay thatallsuch functions exist or are evaluated when the proposition is "brought to mind"; perhaps "understandingaproposition"islikecompilingafunction and"verifyingapropo sition"islikeevaluating it. Thefunction maybeimplicitinanevaluation (infer ence) mechanism ormoreexplicit,asina"procedural" semanticssuchasthatof theprogramming languagesPLANNER andCONNIVER [Hewitt 1972;Sussman and McDermott 1972;Winograd 1978].Aproposition maythusbeencodedasan (analogical!) proceduralrecipeforestablishingtheproposition.Anexamplemight

322

Ch. 10 Knowledge Representation and Use

bethisrepresentationofthefact"InCalifornia,GrassandTreesproducegreenre gions." (ToEstablish (GreenRegionx) Establish (AND (InCaliforniaO) (OR(Establish (Grassx)) (Establish (Treesx))))) Thismight mean:To infer thatxisagreen region,establish thatyouarein California andthentrytoestablishthatxarosefrom grass.Shouldthegrass infer encefail, tryto establish thatxarosefrom trees.Sincethe full power ofthepro gramming language isavailable toanEstablish statement, itcanperform general computationstoestablishtheinference. The important point here: Rather than a set of clauses whose application mustbeorganized byaninterpreter, propositionsmayberepresented byanexpli citcontrolsequence,includingprocedurecallstootherprograms.Intheexample, (Grassx)and (Treesx) maybeprocedureswhichhavetheirowncomplicatedcon trolstructures. Tosaythatinacomputer "everything ispropositions"isatruism;anypro gram can be reduced to a Turing machine described by a finite set of "prop ositions"withaverysimpleruleof"inference."Theissueisatwhatlevelthepro gram should be described. A program may be doing propositional resolution theoremprovingoranalogicaltrajectory planningwiththreedimensional models; itisnothelpful toblurthisbasicfunctional distinction byappealing tothelowest implementationallevel.

10.2 SEMANTICNETS

10.2.1 SemanticNetBasics

Semanticnets werefirst introduced under thatnameasameans ofmodeling hu manassociativememory [Quillian 1968].Sincethen theyhavereceived muchat tention [Nilsson 1980;Woods 1975;Brachman 1976;Findler 1979].Wearecon cernedwiththreeaspectsofsemanticnets. 1. Semanticnetscanbeusedasadatastructureforconveniently accessing both analogical and propositional representations. For the latter their construction isstraightforwardandbasedsolelyonpropositionalsyntax (Chapter12). 2. Semanticnetscanbeusedasananalogicalstructure thatmirrorstherelevant relationsbetweenworldentities. 3. Semantic netscanbeused asapropositionalrepresentation withspecialrules ofinference.Bothclassicalandextendedinferencecanbesupported,butitisa challenging enterprise to design net structure that provides the propertiesof formallogic[Schubert 1976;Hendrix1979].

Sec. 70.2 Semantic Nets

323

Asemanticnetworkrepresentsobjectsandrelationshipsbetweenobjectsasa graph structure of nodesand (labeled) arcs.The arcsusually represent relations betweennodesandmaybe"followed" toproceed from nodetonode.Adirected arcwithlabelLbetween nodesXand Yeansignify that thepredicate L(X, Y) is true.If, inaddition,ithasavalueV, thearccansignify thatsomefunction orrela tionholds:L(X, Y) = V. The indexingpropertyofanetwork isoneof itsuseful aspects.The network canbeconstructedsothatobjectsthatareoften associatedincomputations,orare especially relevant or conceptually close to each other, may be represented by nodes in the network that are near each other in the network (as measured by number ofarcsseparatingthem).Figure 10.3showstheseideas:(a)nodescanbe associated bysearchingoutwardalongarcsand (b)nodesnearaspecifiednodeare readily available byfollowing arcs.Semantic networksareespecially attractiveas analogical representations of spatial states of affairs. If we restrict ourselves to binaryspatialrelations ("above,"and"westof," forexample),physicalobjectsor partsofobjects mayberepresented bynodes,and their positions withrespect to eachotherbyarcs. Letuslookatasemanticnetandmakesomebasicobservations.Figure10.4 ismeant to beananalogical representation ofanarrangement ofchairsarounda table.TheLEFTOFandRIGHTOFrelationsaredirectedarcs,the ADJACENT relation isundirected; there can be several such undirected arcs between nodes. NoteherethattheLEFTOFandRIGHTOFrelationsdonotbehaveintheirnor malway.Iftheyaretransitive,asisnormal,theneverychairisbothLEFTOFand

(a)

Fig. 10.3 Semantic networks as structures forassociative search, (a) Associating two nodes, (b) Retrieving nearby nodes.
3 2 4 Ch. 10 Knowledge Representation and Use

Lef

t of

Fifi. 10.4 atable.

A representation o fchairsat

RIGHTOF every other chair. Flexible treatment of this sort of phenomenon is sometimesdifficult inpropositionalrepresentations. Asimplebutbasicpoint:ThenetofFig.10.4seemstosayinterestingthings aboutfurniture inascene.Butnoticethatmerelybyrewritinglabelsthesamenet couldbe"about" modular arithmetic,astringofpearls,oranynumberofthings. There are twomoralshere. First, asparselyconnected representation (analogical or propositional) may have several equally good interpretations. Second, a net without anyinterpretation proceduresessentially representsnothing [McDermott 1976]. Nowconsiderthreeneighboringchairsdescribedbythefollowingrelations. 1. 2. 3. 4. 5. 6. 7. CHAIR(Armchair),CHAIR(Highchair),CHAIR(Stool) WIDE(Armchair) HIGH(Highchair) LOW(Stool) LEFTOF(Armchair, Highchair) LEFTOF(Highchair,Stool) BETWEEN(Highchair,Armchair,Stool)

The relations include four properties (relations with "one argument"), a twoargumentandathreeargument relation.Onewaytoencodethis information in anet isshown in Fig. 10.5a. Nodes represent individuals, and properties are kept as node contents. The directed arcs represent only binary relations, and "betweenness" is left implicit. Properties can equally well be represented asla beledarcs(Fig.10.5b). Relationsareencoded asnodesinFig.10.6.HeretheBETWEENrelationis encodedasymmetrically: itisnotpossibletotellbyarcsemanatingfrom thestool thatitisina"between"relationship.
Sec.10.2 Semantic Nets
3 2 5

Wide

High

Low

Fig. 10.5 (a) Asimplesemantic net. (b) Anequivalent net.

The threeplace relation istreated more symmetrically in Fig. 10.7.Ingen eral,placerelationsmaybe"binarized" thisway;createanodeforthe "relation instance"andnew (relation) nodesforeachdistinctargumentroleinthearyre lation. An important point: Arcsand nodes had auniform semantics in Fig. 10.4. This property waslost in the succeeding nets; nodesareeither "things" or rela tions, and arcs leading into relations are not the same asthose leading out. For such nets to be useful, the net interpreter (a program that manipulates the net) mustkeepthesethingsstraight.Itispossiblebutnoteasytodevisearichanduni formnetworksemantics [Brachman1979].

(^WideJ)

Fig. 10.6 Anetwith moreexplicit information. 326


Ch. 10 Knowledge Representation and Use

Fig. 10.7 Anetwithyetmoreexplicit information.

10.2.2 SemanticNetsfor Inference

Thissectionexploressomefurther importantissuesinthesemanticsofsemantic nets.InChapter 12semanticnetsareusedasanindexingmechanisminpredicate calculustheoremproving. Insomeapplicationsaninferencesystemwithprovably good formal properties may be too restrictive. Some formal properties (such as maintainingconsistency bynotdeducingcontradictions) maybeconsideredvital, however. Howcan "good behavior" beobtained from arepresentation thatmay contain"inconsistent" information? Oneexampleofan"inconsistent"representationisthenetofFig.10.3,with its LEFTOF and RIGHTOF problem. Another example isanet version of the propositions "All birds fly," "Penguins are birds," "Penguins donot fly." The generalization isuseful "commonsense" knowledge, buttherareexceptionsmay beimportant,too.Network interpreterscancopewiththesesortsofproblemsbya number of methods, such as only accessing a consistent subnetwork, making deductions from the particular toward the general (this takes careof penguins), andsoforth. Allthesetechniquesdependonthestructureimposedbythenet. Somemoresubtleaspectsofnetrepresentationsappearbelow.
Semantic Nets 327

Nodes The basic notation ofFig. 10.4may tempt us to produce anet such as that showninFig.10.8.ConsidertheobjectnodeskyinFig.10.8.Doesitstandfor the genericskyconceptorforaparticularskyataparticulartimeandlocation?Clearly both meanings cannot be embodied in the same node because they are used in such different ways in reasoning. The standard solution is to use nodes to differentiate betweenatype, orgenericconcept,andatoken,orinstanceofit.Fig ure 10.9showsthismodification usingthe e(elementof) relation torelatethein dividual to the generic concept. In this simple case, the node sky stands for the type,andtheemptynodestandsforatoken,orinstanceoftheskyconcept. The distinction between typeandtokenisrelated tothedistinction between intensionaland extensionalconcepts. In analyzing an aerial image there is a difference between "Allbridgesspanroadsorrivers." and "Allbridges(foundsofar) spanroadsorrivers." (10.2) If"bridges"in (10.1) meansanybridgethatmightbefound, "bridges"isusedin an intensionalsense. If "bridges" means aparticular set, it isused itinanexten sionalsense. Normally relations between typenodes are used in an intensional senseandrelationshipsbetweentokennodeshavetheextensionalsense. Virtualnodesare objects that are not explicitly represented asobject nodes. Theneedforthemarisesinexpressingcomplexrelations.Forexample,consider "Thebridgethatisattheintersectionofroad57 andriver3isnearbuilding30." (10.3) (10.1)

which mayberepresented asshowninFig.10.10.Thenodelabeledxistheresult ofintersectingaparticular roadwithaparticular river.Itisnotrepresented expli citlyasaninstanceofanygenericconcept;itisavirtualnode.Virtualnodescanbe eliminated byintroducing verycomplex relations, but thiswould sacrifice an im portant propertyofnetworks, theability tobuild upavery largenumber ofcorn

Fig. 10.8 Typeortokennodes?


Ch. 10 Knowledge Representation and Use

C5D

M3rouncM (b)

Fi. 10.9 Distinguishing between typesand tokens: (a)Tokenizing an instance, (b)Tokenizing an assertion.

plexrelationsfrom asmallsetofprimitives.Virtualnodesenhancethisabilityby referringtoportionsofcomplicatedrelations. Nodesinthenetworkcanalsobeusedasvariables.Thesevariablescanmatch othernodeswhichrepresentconstants.InFig. 10.11,xandyarevariablesandthe restofthenodesareconstants.Ifnodexismatchedtothe"telephone"node,then xcanberegardedasa"telephone"node.


Road57

Bldg

>*

Bldg30

Result

(^Near jy
Result Bridge V *

Int.

River V *

e=elementof

Fig. 10.10 Virtual nodes.


Sec. 10.2 Semantic Nets

329

(a)

(b)

Fig. 10.11 Nodesas variables, (a) Black telephone and penon desk, (b) Object denoted byvariablexwith variable colory.

Often, itisuseful tohavenumericalvaluesasnodeproperties.Thiscanex tendthediscreterepresentation ofnodesandarcstoacontinuousone.Forexam ple,inaddition to"colorofxisred37"wemayalsoassociatetheparticular value ofredthatwemeanwithnodered37.Aspecialkindofvalueisadefaultvalue.Ifa valuecanbefound forthenodeinthecourseofmatchingothernodeswithvalues orbyexaminingimagedata,thenthatvalueisusedforthenodevalue.Otherwise, thedefaultisused. Relations Complex relations ofmanyarguments arenot uncommon intheworld, but forthebulkofpracticalwork,relationsofonlyafewargumentsseemtosuffice.Se manticnetscanclearlyrepresent twoargument relationsthrough theirnodesand arcs. Morecomplex relations may bedealt with byvarious devices.The linksto multiple arguments maybeordered withinarelationnode,ornewnodesmaybe introducedtolabeltherolesofmultiplearguments (Fig.10.7). Ifinference mechanismsaretomanipulatesemanticnets,certain important relationsdeservespecialtreatment.Onesuchrelationisthe"ISA" relation.The basicissueaddressed bythisrelation ispropertyinheritance [Moore1979].Thatis, ifFredISACamelandaCamelISAMammal,thenpresumablyFredhasthepro pertiesassociatedwithmammals.Itoftenseemsnecessarytodifferentiate between varioussensesof"ISA." Onebasicsense of"J ISA Y" is"Zis anelement of theset Y ";othersare"^denotes Y" " ^ i s asubsetof F,"and" Fisanabstrac tion ofX" Notice that each sense depends ondifferently "typed" arguments;in the first three cases Xis, respectively, an individual, aname, and aset. Deeper

330

Ch.10 KnowledgeRepresentation and Use

treatments of these issues are readily available [Brachman 1979; Hayes* 1977; Nilssonl980]. Itisparticularlyhelpful tohaveadenotion linktokeepperceptual structures separatefrom modelstructures.Then ifmistakesaremadebythevision automa ton, acorrection mechanism can either sever the denotation link completely or createanewdenotationlinkbetweenthecorrectmodelandimagestructures. When dealingwith many spatialrelations,itiseconomicaltorecognize that many relations are "inverses" ofeach other. That is,LEFTOF(x,.y) isthe "in verse"ofRIGHTOF0t,.y); LEFTOF(xyO <==> RIGHTOF(y,x) andalso ADJACENTGey) < = > ADJACENT(y,x) Rather than double the number of these kinds of links, one can normalize them.That is,onlyonehalf oftheinversepair isused, and theinterpreter infers theinverserelationwhennecessary. Propertieshaveadifferent semanticsdependingonthetypeofobjectthathas the property. An "abstract" node can have a property that gives one aspect or refinement oftherepresented concept.Apropertyofa"concrete"nodepresum ablymeansanestablishedandquantified propertyoftheindividual. Partitions Partitions areapowerful notion innetworks. "Partition" isnot used in the senseofamathematicalpartition,butinthesenseofabarrier.Sincethenetworkis agraph,itcontainsnointrinsicmethod ofdelimitingsubgraphsofnodesandarcs. Suchsubgraphsareusefulfortworeasons: 1. Syntactic. It isuseful todelimit that part ofthenetwork which represents the resultsofspecific inferences. 2. Semantic.It is useful to delimit that part of the network which represents knowledge about specific objects. Partitions may then be used to impose a hierarchyuponanotherwise"flat" structureofnodes. Thesimplewayofrepresentingpartitionsinanetistocreateanadditionalnodeto represent thepartitionandintroduceadditionalarcsfrom thatnodetoeverynode orarcinthepartition.Partitionsallowthenodesandrelationsinthemtobemani pulatedasaunit. Notationally, itiscleaner todrawalabeled boundary enclosing the relevant nodes(orarcs).AnexampleisshownbyFig.10.12whereweconsider twoobjects eachmadeupofseveralpartswithoneobjectentirelyleftoftheother.Ratherthan useaseparateLEFTOFrelationforeachoftheparts,asinglerelationcanbeused betweenthetwopartitions.Anypairofparts(onefrom eachobject) shouldinherit theLEFTOFrelation. Partitions maybeused toimplement quantification inse manticnetrepresentationsofpredicatecalculus [Hendrix 1975,1979]. Theymay beusedtoimplementframes (Section10.3.1).

Sec. 70.2 Semantic Nets

331

Leftof ( Table ") 6 *< Chair )

Fig. 10.12 The use of partitions, (a) Construction ofapartition, (b)Twoobjects described bypartitions.

Conversions Itisimportanttobeabletotransform fromgeometric (andlogical) represen tations to propositional abstract representations and vice versa. For example, in Fig. 10.13theproblem istofindtheexactlocationofatelephoneonapreviously locateddesk.Inthiscase,propositional knowledge thattelephonesareusuallyon desktops,togetherwiththedesktoplocationandknowledgeaboutthesizeoftele phones,defineasearchareaintheimage. Convertingimagedataaboutaparticulargroupofobjectsintorelationalform involvesthe inverse problem.The problemistoperform alevelofabstraction to removethespecificity ofthegeometricknowledgeandderivearelationthatisap propriateinalargercontext.Forexample,thefollowing programfragment creates therelationsABOVE04,5),whereAandBareworldobjects. Comment:assumeaworldcoordinatesystemwhereZisthepositivevertical. FindZAminforZinAandZBmax forZinB. IfZAmm > ZBmax, thenmakeABOVE(A,B) true. Manyotherdefinitions ofABOVE,oneofwhichcomparescentersofgravity,are possible. In most cases, the conversion from continuous geometric relations to discretepropositionalrelationsinvolvesmoreorlessarbitraryconventions.Toap preciatethisfurther, consult Fig. 10.14andtrytodetermineinwhichofthecases
332 Ch. 10 Knowledge Representation and Use

Fig. 10.13 Searchareadefined byrelational bindings.

blockAisLEFTOFblockB.Figure 10.14dshowsacasewheredifferent answers are obtained depending on whether atwodimensional or threedimensional in terpretationisused.Also,whenrelationsareusedtoencodewhatisusuallytrueof theworld,itisoften easytoconstructacounterexample.Winston [Winston 1975] used SUPPORTS (B,A) ABOVE (A,B)

<3>
&
(a)

(d)

(b)

(c)

Fig. 10.14 Examples to demonstrate difficulties in encoding spatial relation LEFTOF (see text).
70.2 Semantic Nets

333

whichiscontradictedbyFig.10.15,giventhepreviousdefinition ofABOVE. Onecommon wayaround theseproblemsistoassociatequantitative, "con tinuous"information withrelations(section 10.3.2andlaterexamples).


10.3 SEMANTIC NETEXAMPLES

ExamplesofsemanticnetsaboundthroughoutPartIV.Twomoreexamplesillus trate the power ofthe notions.Thefirstexample isdescribed verygenerally, the secondindetail.
10.3.1 Frame Implementations

Framesystemtheory [Minsky 1975]isawayofexplainingourquickaccesstoim portant aspects of a (perhaps perceptual) situation. It is aprovocative and con troversial idea,and thereader should consult theReferences for afull treatment. Implementationally, aframe mayberealized byapartition; aframe isa "chunk" ofrelatedstructure. Associating related "chunks" of knowledge into manipulable units is a powerful andwidespread ideainartificial intelligence [Hayes 1980;Hendrix 1979] as well as psychology. These chunks go byseveral names: units, frames, parti tions, schemata, depictions, scripts, and so forth [Schank and Abelson 1977; Moore and Newell 1973;Roberts and Goldstein 1977;Hayes* 1977;Bobrowand Winograd 1977, 1979; Stefik 1979; Lehnert and Wilks 1979; Rumelhart et al. 1972]. Framessystemsincorporateatheoryofassociativerecallinwhichoneselects framesfrom memorythatarerelevanttothesituationinwhichonefindsoneself. These frames includeseveral kindsofinformation. Most important, frames have slotswhichcontaindetailsoftheviewingsituation. Frametheorydictatesastrictly specificandprototypicalstructureforframes.Thatis,thenumberandtypeofslots for aparticular type of frame are immutable and specified in advance. Further, frames representspecificprototypesituations;manyslotshavedefault values;this iswhereexpectationsand prior knowledge comefrom. These default valuesmay bedisconfirmed byperceptual evidence;ifthey are, theframe cancontain infor f i mationaboutwhatactionstotaketofilltheslot.Someslotsaretobe illednbyin vestigation.Thusaframe isasetofexpectationstobeconfirmed or disconfirmed

Fig. 10.15 Acounterexamplelo SUPPORTS(B, A) = > ABOVEU B). 334


Ch. 10 Knowledge Representation and Use

andactionstopursueinvariouscontingencies.Onecommonactionisto"bringin another frame." Thetheoryisthatbasedonapartialmatchofaframe's definingslots,aframe can be "brought to mind." The retrieval is much likejumping to a conclusion basedonpartial evidence. Oncetheframe isproposed, itsslotsmust bematched upwithreality;thuswehavetheinitialmajorhypothesisthattheframerepresents, which itself consistsofanumber ofminor subhypotheses tobeverified. A frame may haveother frames initsslots,andsoframes maybelinkedinto"frame sys tems" that are themselves associatively related. (Consider, for example, the linkedperceptualframes for beingjust outsideatheaterandfor beingjustinside.) Transformations between frames correspond to the effects of relevant actions. Thus thehypothesescansuggestoneanother. "Thinking alwaysbeginswithsug gestive but imperfect plans and images; these are progressively replaced by betterbutusuallystillimperfectideas" [Minsky1975]. Frametheoryiscontroversialandhasitsshareoftechnicalproblems [Hinton 1977].Themostimportantofthesearethefollowing. 1. Multiple instances ofconceptsseem tocall for copying frames (since thein stances may have different slotfillers). Hence, one loses the economy of a preexistingstructure. 2. Often, objectshavevariablenumbersofcomponents (wheelsonatruck, run ways in an airport). The natural representation seems to be arule for con structingexamples,notsomespecificexample. 3. Default valuesseeminadequatetoexpresslegalrangesofslotfilling valuesor dependenciesbetweentheirproperties. 4. Property inheritance isan important capability that semantic nets can imple ment with "is a" or "elementof" hierarchies. However, such hierarchies raisethequestionofwhichframe tocopywhenaparticularindividualisbeing perceived. Should one copy the generic Mammal frame or the more specific Camelframe, for instance.Surely, itisredundantfor theCamelframe todu plicatealltheslotsintheMammalframe. Yetourperceptualtaskmaycallfor aparticularslottobefilled,anditispainful nottobeabletotellwhereanypar ticularslotresides. Nevertheless, where these disadvantages can be circumvented or are ir relevant, frames areseeing increasing use.They areanatural organizing tool for complexdata.
10.3.2 Location Networks

Thissectiondescribesasystemforassociatinggeometricanalogicaldatawithase mantic net structure which issometimes like aframe with special "evaluation" rules.Thesystemisageometricalinference mechanism thatcomputes (or infers) twodimensional search areas in an image [Russell 1979]. Such networks have found useinbothaerialimageapplications [BrooksandBinford 1980;Nevatiaand Price1978]andmedicalimageapplications [Ballardetal.1979].
Sec. 70.3 Semantic Net Examples

335

The Network A locationnetwork isanetwork representation ofgeometric point sets related by settheoretic and geometric operations such asset intersection and union, dis tancecalculation, and soforth. The operationscorrespond torestrictionsonthelo cation ofobjects in the world. These restrictions, or rules, are dictated by cultural orphysical facts. Each internal node of the location network contains ageometric operation,a list of arguments for the operation, and a resultof the operation. For instance, a node might represent the settheoretic union of two argument point sets, and the resultwould beapointset.Inference isperformed byevaluatingthenet; evaluating allitsoperations toderiveapointsetfor thetop (root) operation. Thenetwork thus hasahierarchy ofancestorsanddescendents imposed on it through the argument links. At the bottom of this hierarchy are data nodeswhich contain no operation or arguments, only geometric data. Each node is in one of three states:Anode is uptodate ifthe data attached to itarecurrently considered to beaccurate. It is outofdate ifthe datainitare known tobeincomplete, inaccu rate, or missing. It is hypothesized if itscontents have been created by net evalua tion butnotverified intheimage. In a common application, the expected relative locations of features in a scene areencoded in a network, which thus models the expected structure of the image.The primitive set ofgeometric relations between objects ismade upof four different typesofoperations. 1. Directionaloperations (left, reflect, north, up,down, and soon) specify apoint setwiththeobviouslocationsandorientations to another. 2. Area operations (closeto, inquadrilateral, incircle and so on) create a point setwithanondirectional relationto another. 3. Set operations (union, difference and intersection) perform the obvious set operations. 4. Predicates on areas allow point sets to be filtered out of consideration by measuring some characteristic of the data. For example, a predicate testing width, length, or area against some value restricts the size of sets to be those withinapermissible range. The location of the aeration tank in a sewage treatment plant provides a specific example. The aeration tank isoften a rectangular tank surrounded on ei ther end bycircular sludgeandsedimentation tanks (Fig. 10.16).Asageneral rule, sewageflowsfrom thesedimentation tanks toaeration tanksand finally through to the sludge tanks. This design permits the use of thefollowing typesof restrictions on thelocationoftheaeration tanks. Rule 1:"Aeration tanksarelocated somewhereclosetoboththesludgetanks andthesedimentation tanks."

336

Ch. 10 Knowledge Representation and Use

Fig. 10.16 Aerial imageofasewage plant.

Thevarious tankscannot occupythesamespace,so: Rule2: "Aeration tanksmustnot betooclosetoeither thesludgeorsedimen tationtanks." Rule 1istranslated tothefollowing network relations. CLOSETO(Union (LocSludgeTanks, LocSedTanks), Distance X) Rule2istranslated to NOTIN(Union (LocSludgeTanks,LocSedTanks), Distance Y) The network describing theprobablelocation ofthe aeration tanks embodies both of these rules. Rule 1determines an area that is close to both groupings of tanksand Rule 2eliminates aportion ofthatarea.Thinking ofthe imageasapoint set, a set difference operation can remove the area given by Rule 2 from that specified by Rule 1. Figure 10.17 shows the final network that incorporates both rules. Of course, there could be places where the aeration tanks might be located very far away or perhaps violate some other rule. It isimportant to note that, like the frames of Section 10.3.1, location networks give prototypical, likely locations for anobject.Theycanwork verywellfor stereotyped scenes,and mightfailto per form innovel situations. TheEvaluation Mechanism Thenetwork isinterpreted (evaluated) byaprogram that workstopdown in a recursive fashion, storing the partial results ofeach rule at the topmost nodeas

Sec. 70.3 Semantic Net Examples

337

Aeration tank

0 1 difference

Distance Y closeto

Distance

Sludge tanks

Fig. 10.17 Constraint networkforaeration tank.

sociated with that rule (with a few exceptions). Evaluation starts with the root node. In most networks, this node is an operation node. An operation node is evaluated byfirst evaluating allitsarguments, and then applying itsoperation to thoseresults.Itsownresultisthenavailabletothenodeofthenetwork thatcalled foritsevaluation. Data nodes may already contain results which might come from amap or from the previousapplication ofvision operators. Atsome point inthe courseof theevaluation,theevaluatormayreachanodethathasalreadybeenevaluatedand ismarked uptodate orhypothesized (suchanodecontains theresultsofevalua tionbelowthatpoint).Theresultsofthisnodearereturnedandusedexactlyasifit wereadatanode.Outofdate nodescausetheevaluation mechanism toexecutea lowlevel proceduretoestablishthelocationofthefeature. Iftheprocedureisun abletoestablishthestatusoftheobjectfirmlywithinitsresourcelimits,thestatus willremainoutofdate. Atanytime,outofdate nodesmaybeprocessedwithout having to recompute any uptodate nodes. A node marked hypothesized hasa value,usuallysupplied byaninference process,andnotverified bylowlevelim ageanalysis.Hypothesized datamaybeusedininferences: theresultsofall infer encesbasedonhypothesizeddataaremarkedhypothesizedaswell.

3 3 8

Ch. 10 Knowledge Representation and Use

Ifadatanodeeverhasitsvaluechanged (say,byanindependentprocessthat adds new information), all its ancestors are marked outofdate. Thus the root node willindicate an outofdate status, butonly those nodes on the outofdate pathmustbereevaluated tobringthenetwork uptodate.Figure 10.18showsthe operation oftheaeration tanknetwork ofFig.10.17ontheinputofFig. 10.16.In thiscasethe initialfeature datawereasinglesludgetankandasingle sedimenta tion tank. Suppose additional work isdone tofindthe location of the remaining sludgeandsedimenttanksintheimage.Thiscausesareevaluationofthenetwork, and the new result more accurately reflects the actual location of the aeration tanks. PropertiesofLocationNetworks Thelocationnetworkprovidesaverygeneralexampleofuseofsemanticnets incomputervision. 1. It serves as adata base of point sets and geometric information. The truth statusofitemsinthenetwork isexplicitlymaintainedanddependsonincom inginformation andoperationsperformedonthenet. 2. Itisanexpansionofageometricexpressionintoatree,whichmakestheorder of evaluation explicit and in which the partial results are kept for each geometric calculation.Thus itprovides efficient updatingwhen somebut not allthepartialresultschangeinareevaluation. 3. It provides away to makegeometrical inferences without losingtrack of the hypothetical nature ofassumptions. Thetreestructurerecords dependencies among hypotheses and geometrical results, and so upon invalidation of a geometric hypothesis the consequences (here, what other nodes have their valuesaffected) areexplicit.Therecordofdependenciessolvesamajorprob leminautomatedinferencesystems. 4. It reflects implicit universal quantification. The network claims to represent truerelationswhoseexplicitargumentsmustbefilledinasthenetworkis"in stantiated"withrealdata. 5. Ithasa"flat" semantics.Therearenoelementofhierarchiesorpartitions. 6. The concept of "individual" is flexible. A point set can contain multiple disconnected components corresponding to different world objects. In set operations, such an assemblage acts like an explicit set union of the com ponents.An"individual" inthenetwork maythuscorrespond tomultiplein dividualpoint(sub)setsintheworld. 7. Thenetwork allowsuseofpartialknowledge.Asettheoreticsemanticsofex istence and location allows modeling of an unknown location by the set theoretic universe (the possible location is totally unconstrained). If some thingisknownnottoexistinaparticularimage,its"location" isthenullset. Generally,alocationisapointset. 8. The settheoretic semantics allows useful punning on set union and theOR operation, and set intersection and the AND operation. If a dock is on the

Sec. 70.3 Semantic Net Examples

339

shorelineANDnearatown, the searchfordocksneed onlybecarried outin theintersectionofthelocations.


10.4 CONTROL ISSUESINCOMPLEX VISION SYSTEMS

Computer vision involves the control of large, complex informationprocessing tasks.Intelligentbiologicalsystemssolvethiscontrolproblem.Theyseemtohave complicated control strategies, allowing dynamic allocation of computational resources, parallelism, interruptdriven shifts of attention, and incremental behaviormodification.Thissectionexploresdifferent strategiesforcontrollingthe complex information processing involved invision. Appendix 2contains specific

340

Ch.10 KnowledgeRepresentation and Use

techniques and programming language constructs that have proven to be useful toolsinimplementingcontrolstrategiesforartificial intelligenceandcomputervi sion.
10.4.1 ParallelandSerial Computation

Inparallelcomputation,severalcomputationsaredoneatthesametime.Forexam ple, different parts of an image may be processed simultaneously. One issue in parallel processing issynchronization: Is the computation such that the different partscanbedoneatdifferent rates,ormusttheybekeptinstepwitheach other? Usually, theansweristhatsynchronization isimportant. Another issueinparallel processing isitsimplementation. Animal vision systems have thearchitecture to do parallel processing, whereas most computer systems are serial (although developing computer technologies may allow the practical realization of some parallel processing). On aserialcomputer parallelism must besimulatedthis is notalwaysstraightforward. Inserialcomputation, operationsareperformed sequentially intimewhether ornot theydependononeanother. Theimpliedsequentialcontrol mechanismis morecloselymatched toa(traditional) serialcomputerthanisaparallelmechan ism. Sequentialalgorithmsmust bestingywiththeir resources.Thisfact hashad manyeffects incomputervision.Ithasledtomechanismsforefficient dataaccess, suchasmultipleresolutionrepresentations.Ithasalsoledsometoemphasizecog nitive alternatives for lowlevel visual processing, in the hope that the massive parallel computations performed in biological vision systems could be circum vented.However,thistrendisreversing;cheapercomputationandmorepervasive parallelhardwareshouldincreasethecommitment ofresourcestolowlevelcom putations. Parallel and serial control mechanisms have both appeared in algo rithms inearlier chapters.It seemsclearthatmanylowlevel operations (correla tion, intrinsic imagecomputations) canbeimplemented withparallelalgorithms. Highlevel operations, such as "planning" (Chapter 13) have inherently serial components.Ingeneral, inthe lowlevelsofvisual processingcontrol ispredom inatelyparallel,whereasatthemoreabstractlevelssomeuseful computationsare necessarilyserialinnature.
10.4.2 HierarchicalandHeterarchical Control

Visual control strategies dictate the flow ofinformation and activity through the representational layers. What triggers processing: a low level input like a color patch ontheretina, orahigh levelexpectation (say,expecting toseeared car)? Different emphasisontheseextremes isabasiccontrol issue. The twoextremes maybecharacterizedasfollows. 1. Image data driven. Here the control proceeds from the construction of the generalized imagetosegmented structuresandfinallytodescriptions.Thisis alsocalledbottomupcontrol.
Sec. 10.4 Control Issues in ComplexVision Systems 341

2. Internalmodeldriven.Herehighlevelmodelsintheknowledgebasegenerate expectationsorpredictionsofgeometric,segment,orgeneralizedimagestruc tureintheinput.Imageunderstanding istheverification ofthesepredictions. Thisisalsocalledtopdowncontrol. Topdown and bottomup control aredistinguished not bywhattheydobut rather by the order in which they do it and how much of it they do. Both ap proaches can utilize all the basic representationsintrinsic images, features, geometric structures, and propositional representationsbut the processing withintheserepresentationsisdoneindifferent orders. The division ofcontrol strategies into topdown and bottomup isa rather simplisticone.Thereisevidencethatattentional mechanismsmaybesomeofthe mostcomplicatedbrainfunctions thathumanbeingshave [Geschwind 1980].The different representational subsystems in acomplex vision system influence each otherinsophisticatedandintricateways;whethercontrolflows"up"or"down"is only abroad characterization oflocalinfluence in the (loosely ordered) layersof thesystem. The term "bottomup" wasoriginally applied toparsingalgorithms for for mallanguages thatworked their wayupthe parsetree,assembling theinput into structures as they did so. "Topdown" parsers, on the other hand, notionally started at the topof the parse tree and worked downward, effectively generating expectations or predictions about the input based on the possibilities allowed by thegrammar;theverification ofthesepredictionsconfirmed aparticularparsing. These two paradigms are still basic in artificial intelligence, and provide powerful analogies and methods for reasoning about and performing many informationprocessing tasks. The bottomup paradigm is comparable in spirit with "forward chaining," which derives further consequences from established results.Thetopdownparadigmisreflected in"backwardchaining,"whichbreaks problemsupintosubproblemstobesolved. Thesecontrol organizationscanbeused notonly"tactically" toaccomplish specific tasks, but they candictate the whole "strategy" of the vision campaign. We shall discover that in their pure forms the extreme strategies (topdown and bottomup) appear inadequate toexplain orimplement vision. Moreflexibleor ganizations which incorporate both topdown and bottomup components seem moresuitedtoabroadspectrumofambitiousvisiontasks. BottomUpControl Thegeneraloutlineforbottomupvisionprocessingis: 1. PREPROCESS.Convertrawdataintomoreusableintrinsicforms,tobeinter pretedbynextlevel.Thisprocessingisautomaticanddomainindependent. 2. SEGMENT.Findvisually meaningful imageobjectsperhapscorresponding to worldobjectsortheirparts.Thisprocessisoften butnotalwaysbrokenupinto (a) the extraction ofmeaningful visual primitives,suchaslinesorregionsof homogeneous composition (based on their localcharacteristics);and (b) the agglomerationoflocalimagefeaturesintolargersegments.
Ch.W KnowledgeRepresentation and Use

342

3. UNDERSTAND.Relatetheimageobjectstothedomainfromwhichtheimage arose.Forinstance,identify orclassify theobjects.Asastepinthisprocess,or indeedasthefinalstepinthecomputervisionprogram,theimageobjectsand therelationsbetweenthemmaybedescribed. Inpurebottomuporganization eachstageyieldsdatafor thenext.Thepro gression from rawdata to interpreted scene mayactually proceed in manysteps; thedifferent representations ateachstepallowustoseparate theprocessinto the mainstepsmentionedabove. Bottomup control is practical if potentially useful "domainindependent" processingischeap.Itisalsopractical ifthe inputdataareaccurateandyieldreli able and unambiguous information for the higherlevel visual processes.For ex ample,thebinaryimagesthatresultfrom careful illuminationengineeringandin put thresholding canoften beprocessed quitereliably and quickly inabottomup mode. Ifthe data are less reliable, bottomup styles maystillwork ifthey make onlytolerablyfewerrorsoneachpass. TopDownControl Abottomup,hierarchical modelofperception isatfirstglanceappealingon neurologicalandcomputationalgrounds,andhasinfluenced muchclassicalphilo sophicalthoughtandpsychologicaltheory.The "classical"explanationofpercep tionhasrelativelyrecentlybeenaugmentedbyamorecognitionbasedoneinvolv ing (for instance) interaction ofknowledge and expectations with the perceptual processinamoretopdownmanner [Neisser1967;Bartlett1932].Asimilarevolu tionofthecontrolofcomputer visionprocessinghasaccountedfortheaugmenta tion of the pure "pattern recognition" paradigm with more "cognitive" para digms.Theevidenceseemsoverwhelmingthattherearevisionprocesseswhichdo not "run bottomup,"anditisoneofthemajor themesofthisbookthatinternal models,goals,and cognitive processes must playmajor rolesincomputer vision [Gregory 1970;Buckhout 1974;Gombrich 1972].Ofcourse, theremustbeasub stantialcomponent ofbiologicalvisionsystemswhichcanperform inanoncogni tivemode. Thereareprobablynoversionsoftopdownorganizationforcomputervision that areaspureasthe bottomup ones.The model tokeep inmind in topdown perception isthat ofgoaldirected processing. Ahighlevel goalspawns subgoals whichareattacked,againperhapsyieldingsubsubgoals,andsoon,untilthegoals are simple enough to solve directly. A common topdown technique is "hypothesizeandverify"; here an internal modeling process makes predictions about the way objects will act and appear. Perception becomes the verifying of predictionsorhypothesesthatflowfrom themodel,andtheupdatingofthemodel based on such probes into the perceptual environment [Bolles 1977].Ofcourse, ourgoaldrivenprocessesmaybeinterrupted andresourcesdivertedtorespondto theinterrupt (aswhenmovement inthevisualperipherycausesustolooktoward the moving object).Normally, however, thehypothesisverification paradigmre quiresrelativelylittleinformation from thelowerlevelsandinprincipleitcancon trolthelowlevelcomputations.

ControlIssues in ComplexVision Systems

343

The desire tocircumvent unnecessary lowlevel processing incomputer vi sion isunderstandable. Ourlowlevelvisionsystemperforms prodigious amounts ofinformation processing inseveralcascadedparallellayers.Withserialcomputa tiontechnology,itisveryexpensivetoduplicatethepowerofourlowlevelvisual system.Current technologicaldevelopments arepointing toward makingparallel, lowlevelprocessingfeasibleandthusloweringthisprice.Inthepast,however,the pricehasbeensoheavythatmuchresearch hasbeendevoted toavoidingit, often byusingdomainknowledgetodriveamoreorlesstopdownperceptionparadigm. Thustherearetworeasonstouseatopdowncontrolmechanism.First,itseemsto besomethingthathumanbeingsdoandtobeofinterestinitsownright.Second,it seemstooffer achancetoaccomplishvisualtaskswithoutimpractical expenditure ofresources. MixedTopDownandBottomUpControl Inactualcomputervisionpractice,ajudiciousmixtureofdatadrivenanalysis andmodeldriven predictionoftenseemstoperform betterthaneitherstyleiniso lation. This meld of control styles can sometimes be implemented in acomplex hierarchy withasimple passoriented control structure.Anexample ofmixedor ganization isprovided byatumordetection program which locatessmall nodular tumorsinchestradiographs [Ballard 1976].Thedatadrivencomponent isneeded becauseitisnotknownpreciselywherenodulartumorsmaybeexpectedinthein putradiograph;thereisnoeffective modeldriven locationhypothesizing scheme. Ontheotherhand,adistinctlytopdownflavorarisesfrom theexploitationofwhat littleisknownaboutlungtumorlocation (theyarefound inlungs)andtumorsize. Thevariableresolution methodusingpyramids,inwhichdataareexamined inin creasingly fine detail, also seems topdown. In the example, work done at 1/16 resolutioninaconsolidatedarrayguidesfurther processingat 1/4resolution. Only when small windows of the input array are isolated for attention are they con sideredatfull resolution. The processproceeds inthreepasseswhich movefrom lesstogreater detail (Fig. 10.19),zooming inoninteresting areasofimage,andultimatelyfindingob jects of interest (nodules). Twolater passes (not shown) "understand" theno dulesbyclassifying themas"ghosts,"tumorsornontumors.WithinpassII,there is a distinct datadriven (bottomup) organization, but passes I and III have a modeldirected (topdown) philosophy. Thisexampleshowsthatarelatively simple,passoriented control structure mayimplementamixtureoftopdownandbottomupcomponentswhichfocusat tention efficiently and makethe computation practical. Italsoshowsafew places wheretheorderingofstepsisnotinherentlysequential,butcouldlogicallyproceed inparallel.TwoexamplesaretheoverlappingofhighpassfilteringofpassIIwith passI,andparallelexplorationofcandidatenodulesitesinpassIII. HeterarchicalControl The word "heterarchy" seems to be due to McCulloch, who used it to describe the nonhierarchical (i.e., not partially ordered inrank) nature of neural responsesimpliedbytheirconnectivity inthebrain.Itwasused intheearly 1970s to characterize a particular style of nonhierarchical, nonpassstructured control
3 4 4 Ch. 10 Knowledge Representation and Use

PREPROCESS PassO (Digitize radiograph) Thedigitizer hasa hardwareattechment which producesthe opticaldensity.

SEGMENT

CONTROL

PassI (Findlung boundaries) In64 X56 consolidated array,apply gradientatproper resolution In64X56array, find rough lung outline; in 256X224array, refinelung outline

TOPDOWN

PassI I (Find candidate nodulesites and large tumors) In256 X224array, apply highpass filter toenhance edges,then inside lungboundaries; apply gradientat proper resolution In256 X224array usegradient directed,circular Hough method tofind candidate sites;alsodetect largetumors

BOTTOMUP

PassI I I (Find nodule boundaries) From 1024X896 array,extract 64X64 windowabouteach candidatenodulesite, theninwindowapply highpassfilterfor edgeenhancement; thenapplygradient atproper resolution In64X64full resolution,pre processedwindow, apply dynamic programming technique tofindaccurate nodule boundaries

TOPDOWN

Fig. 10.19 A hierarchical tumordetection algorithm. Technical details of the methods are found elsewhere in this volume. The processing proceeds in passes from top to bottom, and within each pass from left to right. The processing exhi bitsboth topdown and bottomup characteristics.

organization. Rather than a hierarchical structure (such asthemilitary),one should imagineacommunity ofcooperatingandcompetingexperts.Theymaybe organizedintheireffort byasingleexecutive,byauniversalsetofrulesgoverning theirbehavior,orbyanapriorisystemofranking.Ifonecanthinkofataskascon sisting ofmany smaller subtasks, each requiring some expertise, andnotneces sarily performed globally ina fixed order, then thetask could besuitablefor heterarchicallikecontrolstructure. The idea istouse,atany given time,theexpert whocan help mosttoward finaltask solution. The expert maybethemost efficient, orreliable,ormaygive themostinformation;itisselectedbecauseaccordingtosomecriterionitssubtask isthebest thingtodoatthat time.Thecriteriaforselectionarewideand varied, andseveral ideashavebeen tried, theexperts maycompute their ownrelevance, and thedecision made onthebasis ofthose individual local evaluations (as in PANDEMONIUM [Selfridge 1959]). They maybeassigned apriori immutable
Sec.10.4 Control Issuesin Complex Vision Systems 345

rank, so that the highestranking expert that is applicable is always run (as in [Shirai 1975;Ambler et al. 1975]).Acombination of empirically predetermined anddynamicallysituationdriven informationcanbecombinedtodecidewhichex pertapplies. Theactualcontrol structure ofheterarchical programming canbequitesim ple;itcanbeasingleiterative loopinwhich thebestactiontotakeischosen,ap plied,andinterpreted (Fig.10.20).
10.4.3 BeliefMaintenanceandGoalAchievement

Belief maintenance and goal achievement are highlevel processes that imply differing control styles.Theformer isconcernedwithmaintaining acurrentstate, the latter with a set of future states. Belief maintenance is an ongoing activity which can ensure that perceptions fit together in acoherent way. Goal achieve ment isthe integration ofvisionintogoaldirectedactivitiessuchassearching for objects and navigation. There may be "unconscious" use of goalseeking tech niques(e.g.,eyemovementcontrol). Belief aintenance M Anorganismispresentedwitharichvisualinputtointerpret.Typically,itall makes sense: chairs and tables are supported by floors, objects have expected shapesand colors,objects appear toflowpastasthe organism moves,nearer ob jects obscure farther ones, and so on. However, every now and then something

START

Choosethebestaction basedonwhat isknown sofar

Perform it

Inperpret itsresults to increaseknowledge

No < f Done >>Cyes J

STOP J '

Fig. 10.20 A main executive control loop for heterarchical vision. Ch. 10 Knowledge Representation and Use

346

enters thevisualfieldthat doesnot meet expectations. An unfamiliar object ina familiar environment or a sudden movement in the visual periphery can be "surprises"thatdonotfitinwithourexistingbeliefsandthushavetobereckoned with. Itissometimesimpossibletoignoremovementsinourvisualperiphery,but ifweare preoccupied itiseasily possible tostayunconscious ofsmallchangesin ourenvironment. Howisitpossibletonoticesomethingsandnotothers?Thebe lief maintenance mechanism seems to beresourcelimited. Acertain amount of "computing resource" isallocated for thejob.With thisresource, onlyalimited amount of checking can be done. Checks to be made are ranked (somehow responses toevents in the periphery arelikereflexes, orhighpriority hardwired interrupts) and those that cannot bedonewithin the resource limit are omitted. Changes inour beliefsareoften initiated inabottomupway,through unexpected inputs. Asecondcharacteristicof belief maintenance isthe almost totalabsenceof sequential, simulationbased or"symbolic"planningorproblemsolvingactivity. Ourbeliefsare"in thepresent"; manipulation ofhypotheticalworldsisnotbelief maintenance."Truth maintenance"schemeshavebeendiscussed invariouscon texts[Doyle1979;StallmanandSussman1977]. We conjecture that constraintsatisfaction (relaxation) mechanisms (Chapters3,7,and12)arecomputationallysuitedtomaintainingbeliefstructures. Theycanoperateinparallel,theyseektominimizeinconsistency,theycantolerate "noise" in either input or axioms. Relaxation techniques are usually applied to lowlevel visual input where locally noisy parameters are combined into globally consistent intrinsicimages.Chapter 12isconcernedwithinference, inwhichcon straintrelaxationisappliedtohigherlevelentities. CharacteristicsofGoalAchievement Goalachievement involvestworelated activities:planningand acting.Plan ningisasimulation oftheworlddesigned togenerateaplan.Aplanisasequence ofactionsthat,ifcarriedout,shouldachieveagoal.Actionsaretheprimitivesthat can modify the world. The motivation for planning issurvival. Bybeing able to simulatetheeffectsofvariousactions,ahumanbeingisabletoavoiddangeroussi tuations.Inananalogousfashion, planningcanhelpmachineswithvision. Forex ample,aMarsrovercanplanitsroutesoastoavoidsteepinclineswhereitmight topple over. The incline measurement ismade byprocessing visual input. Since planninginvolvesasequenceofactions,eachofwhich ifcarriedoutcould poten tiallychangetheworld,andsinceplanningdoesnotinvolveactuallymakingthose changes, thedifficult task oftheplanner istokeeptrack ofallthedifferent world statesthatcouldresultfromdifferent actionsequences. Vision canclearly serve asan important informationgathering stepinplan ning actions. Can planning techniques be of use directly to the vision process? Clearlysoin"skilledvision,"suchasphotointerpretation.Also,planningisause ful computational mechanism that need not beaccompanied byconscious,cogni tivebehavior.

Sec. 10.4 ControlIssues in ComplexVision Systems

347

Theseinductiveconclusionsleadingtotheformation ofoursense perceptions certainly do lack the purifying and scrutinizing work of conscious thinking. Nevertheless, inmyopinion, bytheirparticular naturetheymaybeclassedas conclusions,inductiveconclusionsunconsciouslyformed. [Helmholtz1925] Thecharacter ofcomputations ingoalachievement isrelated tothe inference mechanisms studied in Chapter 11, only planning is distinguished by being dynamic through time. Inference (Chapter 12) is concerned with the knowledge base and deducing relations that logically follow from it. The primitives areprop ositions.In planning (Chapter 13) the primitives are actions, which are inherently more complex than propositions. Also, planning need not be a purely deductive mechanism; insteaditcan beintegrated with visual "acting", or the interpretation of visual input. Often, along deductive sequence may be obviated by using direct visual inspection. This raises a crucial point: Given the existence of plans, how doesone choosebetween them? The solution istohave amethod ofscoring plans basedonsomemeasureoftheir effectiveness. EXERCISES 10.1 (a) Diagramsomenetworksfor asimpledialtelephone,atvariouslevelsofdetail andwithvariouscomplexitiesofrelations. (b) Nowincludeinyournetworkdialandpushbuttontypes. (c) Embed the telephone frame into an office frame, describing where the tele phoneshouldbefound. 10.2 Is a LISP vision program an analogical or propositional representation of knowledge? 10.3 Write a semantic net for the concept "leg," and use it to model human beings, tables,andspiders.Represent thefact "all tableshavefour legs."Canyour "leg" modelbesharedbetweentablesandspiders?Sharedwithinspiders?

REFERENCES
AMBLER, A. P., H. G. BARROW,C. M. BROWN,R. M. BURSTALL,and R. J. POPPLESTONE."A versatile system forcomputer controlled assembly." ArtificialIntelligence6,2, 1975, 129156. ANDERSON, J. R. and G. H. BOWER. Human Associative Memory. New York: V. H. Winston & Sons, 1973. BALLARD, D. H. HierarchicRecognition of Tumors inChestRadiographs. Basel:BirkhauserVerlag (ISR 16), 1976. BALLARD, D. H. "Modeldirected detection of ribs in chest radiographs." TR11, Computer Science Dept., Univ.Rochester, March 1978. BALLARD, D. H., U. SHANI, and R. B. SCHUDY. "Anatomical models for medical images." Proc, 3rd COMPSAC, November 1979,565570. BARROW, H . G .andJ.M.TENENBAUM. "Computational vision." Proc. IEEE 69,5,May 1981,572595. BARTLETT, F. C. Remembering: A Study inExperimental and Social Psychology.Cambridge: Cambridge University Press, 1932.

348

Ch. 10 KnowledgeRepresentation andUse

BOBROW, D. G. and T. WINOGRAD. " A n overview of KRL0, a knowledge representation language." CognitiveScience1,1,1977, 346. BOBROW,D.G.andT.WINOGRAD. "KRL:another perspective." CognitiveScience3,1, 1979,2942. BOLLES, R. C. "Verification vision for programmable assembly." Proc, 5th IJCAI, August 1977, 569575. BRACHMAN, R.J. "What's in aconcept? Structuralfoundations for semantic networks." Report 3433, Bolt,Beranek andNewman, October 1976. BRACHMAN, R. J. "On the epistemological status of semantic networks." In Associative Networks: Representation and Use of Knowledge by Computers, N.V. Findler (Ed.). New York: Academic Press, 1979,350. BROOKS, R. A. and T. O. BINFORD. "Representing and reasoning about specified scenes." Proc, DARPA IUWorkshop, April 1980,95103. BUCKHOUT,R."Eyewitnesstestimony." ScientificAmerican, December 1974, 2331. DOYLE, J."A truth maintenancesystem."ArtificialIntelligence12,3,1979. FAHLMAN, S. E. "A planning system for robot construction tasks." Artificial Intelligence 5, 1, 1974, 149. FINDLER, N. V. (Ed.). Associative Networks: Representation and Use of Knowledge by Computers. New York:AcademicPress, 1979. FODOR, J. D., J.A. FODOR, and M. F. GARRETT. "The psychological unreality of semantic representa tions." LinguisticInquiry 4, 1975,515531. FREUDER, E.C. " A computer system for visual recognition using active knowledge." Ph.D. disserta tion,MIT, 1975. FUNT, B. V. "WHISPER: a problemsolving system utilizing diagrams." Proc, 5th IJCAI, August . 1977,459464. GARVEY,J.D. "Perceptual strategies for purposive vision."Technical Note 117,AICenter, SRI Inter national, 1976. GELERNTER, H. "Realization ofageometrytheorem proving machine." In Computersand Thought, E. Feigenbaum and J.Feldman (Eds.).New York:McGrawHill, 1963. GESCHWIND, N. "Neurological knowledge and complex behaviors." CognitiveScience 4,2,April 1980, 185193. GOMBRICH,E.H.ArtandIllusion.Princeton, NJ:PrincetonUniversityPress, 1972. GREGORY, R.L. TheIntelligentEye. NewYork:McGrawHill, 1970. HAYES,PatrickJ." T h e logicofframes." In TheFrameReader. Berlin:DeGruyter, 1980. HAYES*, PhilipJ. "Some associationbased techniques for lexicaldisambiguation by machine." Ph.D. dissertation, Ecole polytechnique federate de Lausanne, 1977; also TR25, Computer Science Dept., Univ.Rochester, June 1977. HELMHOLTZ, H.von. TreatiseonPhysiologicalOptics(translated byJ.P.T.Sauthall).New York: Dover Publications, 1925. HENDRIX, G. G. "Expanding the utility of semantic networks through partitions." Proc, 4th IJCAI, September 1975,115121. HENDRIX, G. G. "Encoding knowledge in partitioned networks." In Associative Networks: Representa tionand Useof Knowledge byComputers, N.V. Findler (Ed.).New York:Academic Press, 1979, 5192. HEWITT, C. "Description and theoretical analysis (using schemata) of PLANNER" (Ph.D. disserta tion).AITR258,AILab,MIT, 1972. HINTON, G. E. "Relaxation and its role in vision." Ph.D. dissertation, Univ. Edinburgh, December 1977.

References

349

JOHNSONLAIRD, P.N. "Mental models incognitive science." CognitiveScience 4, 1, JanuaryMarch 1980,71115. KOSSLYN, S. M.andJ. R. POMERANTZ. "Imagery, propositions and theform ofinternal representa tions." CognitivePsychology9, 1977,5276. KOSSLYN, S.M.andS.P. SCHWARTZ. " A simulation of visual imagery." CognitiveScience I, 3, July 1977,265295. KOSSLYN, S.M.andS.P.SCHWARTZ. "Visual images asspatial representations inactive memory." In CVS, 1978. LEHNERT,W.andY. WILKS." AcriticalperspectiveonKRL." CognitiveScience3, 1,1979,128. LOZANOPEREZ, T. and M. A. WESLEY. " A n algorithm for planning collisionfree paths among po lyhedral obstacles." Comm. ACM22, 10,October 1979,560570. MCDERMOTT, D."Artificial intelligence meets natural stupidity." SIGART Newsletter 57, April1976, 49. MINSKY, M.L." Aframework forrepresenting knowledge."InPCV,1975. MOORE J.and A. NEWELL. "HowcanMERLIN understand?" In Knowledge andCognition, L. Gregg (Ed.).Hillsdale,NJ:Lawrence Erlbaum Assoc, 1973. MOORE, R.C."Reasoning about knowledge andaction." Techical Note 191,AICenter, SRIInterna tional, 1979. NEISSER,U.CognitivePsychology.NewYork:AppletonCenturyCrofts,1967. NEVATIA, R.andK.E. PRICE. "Locating structures inaerial images." USCIPI Report 800, Image Proc essingInstitute, Univ.Southern California, March 1978,4158. NILSSON, N.J.ProblemSolvingMethods inArtificialIntelligence.NewYork:McGrawHill, 1971. NILSSON, N.J.PrinciplesofArtificialIntelligence.PaloAlto,CA:Tioga, 1980. PALMER,S.E."Visual perception andworld knowledge:notesonamodelofsensorycognitive interac tion." In Explorations in Cognition, D.A. Norman, D.E.Rumelhart, and the LNR Research Group (Eds.).SanFrancisco:W.H.Freeman,1975. PYLYSHYN, Z.W."What themind's eyetellsthemind's brain;acritique ofmental imagery." Psycho logicalBulletin80, 1973,124. QUILLIAN, M.R. "Semantic memory." In Semantic Information Processing, M. Minsky (Ed.). Cam bridge,MA:MITPress,1968.
ROBERTS, R. B.and I.P.GOLDSTEIN."The FRL primer." AI Memo 408, AILab,MIT, 1977.

RUMELHART, D.E.,P.H.LINDSAY,andD.A.NORMAN. " A process model forlongterm memory."In OrganizationofMemory, E.Tulving andJ.Donaldson (Eds.).NewYork:Academic Press, 1972. RUSSELL,D.M."Where doIlook now?" Proc, PRIP,August 1979,175183. SCHANK, R.C.ConceptualInformation Processing. Amsterdam: NorthHolland,1975. SCIIANK, R.C.andR.P.ABELSON.Scripts,Plans, GoalsandUnderstanding.Hillsdale,NJ: Lawrence Erl baum Assoc,1977. SCHUBERT, L.K."Extending theexpressive power ofsemantic networks." Artificial Intelligence7,2, 1976,163198. SELFRIDGE, O. "Pandemonium, a paradigm for learning." In Proc, Symp. on the Mechanisation of Thought Processes, National PhysicalLaboratory, Teddington, England,1959. SHEPARD, R.N."Themental image."AmericanPsychologist33, 1978, 125137. SHIRAI, Y."Analyzing intensity arraysusingknowledge about scenes."InPCV, 1975. SLOMAN, A."Interactions between philosophy andartificial intelligence:theroleofintuition and non logical reasoninginintelligence." ArtificialIntelligence2,3/4,1971,209225. STALLMAN, R.M.andG.J. SUSSMAN. "Forward reasoninganddependencydirected backtracking ina system forcomputeraided circuit analysis."ArtificialIntelligence9,2,1977, 135196. 350 Ch. '10 KnowledgeRepresentationandUse

STEFIK, M."An examination ofa framestructured representation system."Proc,6thIJCAI,August 1979,845852. SUSSMAN, G.J.and D. MCDERMOTT. "Why conniving isbetter than planning." AI Memo 255A,AI Lab, MIT,1972. WALTZ, D. and L. BOGGESS. "Visual analog representations for natural language understanding." Proc,6thIJCAI,August 1979,226234. WINOGRAD,T."Extended inference modesinreasoningbycomputersystems."Proc,Conf.onInduc tiveLogic,Oxford Univ.,August1978. WINOGRAD,T."Framerepresentationsand thedeclarative/proceduralcontroversy." In Representation andUnderstanding, D.G.Bobrowand A.M.Collins (Eds.).NewYork:AcademicPress,1975, 185210. WINSTON,P.H."Learningstructuraldescriptionsfrom examples."InPCV,1975. WINSTON,P.H.ArtificialIntelligence. Reading,MA:AddisonWesley,1977. WOODS,W.A."What's inalink? Foundations forsemantic networks." In Representationand Under standing,D.G.BobrowandA.M.Collins(Eds.).NewYork:AcademicPress,1975.

References

351

Matching
11.1 ASPECTSOFMATCHING

11

11.1.1 Interpretation:Construction,Matching,andLabeling

Figure 10.1 shows a vision system organization in which there are several representations forvisualentities.Acomplexvisionsystemwillatanytimehave severalcoexistingrepresentations for visualinputsandother knowledge.Percep tionisthe processofintegrating the visualinput withthepreexisting representa tions, for whatever purpose. Recognition, belief maintenance, goalseeking, or building complex descriptionsall involve forming or finding relations between internal representations. These correspondences match ("model," "re represent,""abstract,""label") entitiesatonelevelwiththoseatanotherlevel. Ultimately,matching"establishesaninterpretation"ofinputdata,wherean interpretation is the correspondence between models represented in a computer andtheexternalworldofphenomenaandobjects.Todothis,matchingassociates different representations,henceestablishingaconnection betweentheir interpre tations intheworld. Figure 11.1illustrates thispoint.MatchingassociatesTOK NODE,atokenforalineargeometricstructurederivedfrom image segmentation efforts withamodeltokenNODE101foraparticularroad.Thetoken TOKNODE hastheinterpretationofanimageentity;NODE101hastheinterpretationofapar ticularroad. Onewaytorelaterepresentations istoconstructonefrom theother. Anex ampleistheconstruction ofan intrinsicimagefrom rawvisualinput. Bottomup construction in a complex visual system is for reliably useful, domain independent, goalindependent processing steps. Such steps rely only on "compiledin" ("hardwired," "innate") knowledge supplied bythedesignerof thesystem. Matchingbecomesmoreimportantastheneededprocessingbecomes more diverse and idiosyncratic to an individual's experience, goals, and

Input

_ x /

Reference

Fig. 11.1 Matchingand interpretation.

knowledge.Thusasprocessingmovesfrom "early"to"late,"controlshifts from bottomup toward topdown, and existing knowledge beginstodominatepercep tion. Thischapterdealswithsomeaspectsofmatching,inwhichtwoalreadyexist ingrepresentationsareputintocorrespondence.Whenthetworepresentationsare similar (bothareimagesorrelationalstructures,say),"matching"canbeusedin its familiar sense. When the representations are different (one image and one geometric structure, say), we use "matching" in an extended sense; perhaps "fitting"wouldbebetter.Thissecondsortofmatchingusuallyhasatopdownor expectationdrivenflavor;arepresentation isbeingrelatedtoapreexistingone. Asafinalextensiontothemeaningofmatching, matchingmightincludethe process ofchecking astructure with aset of rulesdescribing structural legality, consistency,orlikelihood.Inthissenseascenecanbematchedagainstrulestosee if it is nonsense or to assign an interpretation. One such interpretation process (called labeling) assigns consistent or optimally likely interpretations (labels) at one leveltoentities ofanother level.Labeling islikematching agiven structure withapossiblyinfinitesetofacceptablestructurestofindthebestfit.However,we (fairly arbitrarily) treat labeling inChapter 12asextended inference rather than hereasextendedmatching.
11.1.2 Matching Iconic,Geometric,andRelational Structures

Chapter 3presented various correlation techniques for matching iconic (image like) structureswitheachother. Thebulkofthischapter,startinginSection11.2, dealswithmatchingrelational(semanticnet)structures.Anotherimportantsortof matchingbetweentwodissimilarrepresentationsfitsdatatoparameterized models (usually geometric).This kind ofmatching isan important part ofcomputer vi
77.7 Aspects of Matching

353

sion.AtypicalexampleisshowninFig.11.2. Apreexistingrepresentation (herea straightline) istobeusedtointerpretasetofinputdata.Thelinethatbest "ex plains"thedatais(bydefinition) thelineof"best fit." Noticethatthedecisionto usealine (rather thanacubic,orapiecewiselinear template) ismadeatahigher level.Giventhemodel, thefittingormatchingmeansdeterminingtheparameters ofthemodelthattailoritintoausefulabstractionofthedata. Sometimesthereisnoparameterized mathematical modeltofit,butrathera given geometric structure, such asapiecewise linearcurve representing ashore line in amap which is to be matched to apiece of shoreline in an image, or to anotherpiecewiselinearstructurederivedfrom suchashoreline.Thesegeometric matching problems are not traditional mathematical applications, but they are similar in that the best match is defined as the one minimizing a measure of disagreement. Often, thecomputationalsolutionstosuchgeometricmatchingproblemsex hibit considerable ingenuity. For example, the shorematching example above mayproceed byfindingthat positionfor thesegment ofshore tobematched that minimizes somefunction (perhapsthesquare) ofadistance metric (perhaps Eu clidean) betweeninputpointsontheiconicimageshorelineandthenearestpoint on the reference geometric map shoreline. To compute the smallest distance betweenanarbitrarypointandapiecewiselinearpointsetisnotatrivialtask,and thiscalculation mayhavetobeperformed often tofindthebestmatch.Thecom putationmaybereduced toasimpletablelookupbyprecomputingthemetricina "chamfer array," that contains the metric ofdisagreement for any point around the geometric reference shoreline [Barrow et al. 1978].The array may be com putedefficiently bysymmetricaxistransform techniques (Chapter8)that "grow" the linearstructure outward incontours ofequal disagreement (distance) untila valuehasbeencomputedforeachpointofthechamferarray. Parameteroptimizationtechniquescanrelategeometricalstructurestolower levelrepresentationsandtoeachotherthroughtheuseofameritfunction measur inghowwelltherelationsmatch.Themodelsaredescribedbyavectorofparame tersa= (!,...,an). Themeritfunction Mmustrate eachsetofthoseparameters in terms of a real number. For example, M could be afunction of both a, the parameters,and/Gc),theimage.Theproblemistofindasuchthat Af(a,/(x))
Reference Input

Ax + By + C=0

Fig. 11.2 Matchingor fitting astraight linemodel todata. 354


Ch. 77 Matching

ismaximized. Note that ifaweresome form oftemplate function rather thana vectorofparameters, theproblemstatementwouldencompasstheiconiccorrela tiontechniquesjustcovered.Thereisavastliteratureonoptimization techniques andwecannotdomorethanprovideacursorydiscussionofafewcaseswithexam ples. Formally, the different techniques have to do with the form of the merit function M. Afundamental result from calculus is that if M is sufficiently well behaved (i.e.,hascontinuousderivatives), thenacondition foralocalmaximum (orminimum)isthat M a = | ^ = 0 for; = 1 , . . . , (11.1)

Thisconditioncanbeexploitedinmanydifferentways. SometimesEqs. (11.1) aresufficiently simplesothatthe acanbedetermined analytically,asintheleastsquaresfitting,describedinAppendix1. An approximate solution a0 canbeiteratively adjusted bymoving inthegra dientdirectionordirectionofmaximum improvement: afaf~l +cMa. (11.2)

where cisaconstant.Thisisthe most elementary ofseveral kindsofgradient (hillclimbing) techniques. Here the gradient is defined with respect to M and doesnotmeanedgestrength. Ifthepartialderivativesareexpensivetocalculate,thecoefficients canbeper turbed (either randomly orinastructured way) and the perturbations keptif theyimproveM: (1) a' := a+ Aa (2) a := a'ifM(a') > Mia) A program to fit threedimensional image data with shapes described by sphericalharmonicsusedthesetechniques [SchudyandBallard 1978].Thedetails of the spherical harmonics shape representation appear inChapter 9.Thefitting proceeded bythethird method above.Anominalexpectedshapewasmatchedto boundaries in image data. If asubsequent perturbation in one of its parameters resultsinanimprovementinfititwaskept;otherwise,adifferent perturbationwas made.Figure11.3showsthisfittingprocessforacrosssectionoftheshape. Thoughparameter optimization isanimportant aspectofmatching,weshall notpursueitfurther hereinviewoftheextensiveliteratureonthesubject.

11.2 GRAPHTHEORETIC ALGORITHMS

The remainder of this chapter deals with methods of matching relational struc tures.Chapter 10showedhowtorepresent arelationalstructurecontaining ary relationsasagraphwithlabeledarcs.Recallthatthelabelscanhavevaluesfroma
Sec. 11.2 GraphTheoreticAlgorithms 355

(a)

Fig. 11.3 An example ofmatching as parameter optimization, (a) Initial parameter set (displayed at left asthree dimensional surface (seeFig.9.8) (b) Fitting process:iteratively adjust abased onM (seetext), (c) Final parameter set yields thisthreedimensional surface. {Seecolorinserts.) 356 Ch.11 Matching

continuum, andthat labeled arcs could bereplaced bynodes toyieldadirected graphwithlabelednodes. Dependingontheattributesoftherelationalstructureandofthecorrespon dencedesired,thedefinition ofamatch maybemoreorlesselegant.Itisalways possibletotranslatepowerful representationssuchaslabeledgraphsoraryrela tionsintocomputational representationswhichareamenabletoformal treatment (suchasundirected graphs). However, when graph algorithms aretobeimple mented with computer data structures, thefreedom andpower ofprogramming languagesoften temptstheimplementer awayfrom puregraphtheory.Hecanre place elegant (butoccasionally restrictive andimpractical) graphtheoretic con ceptsandoperationswitharbitrarilycomplexdatastructuresandalgorithms. Oneexample isthe"graph isomorphism" problem, averypure versionof relational structure matching.Init,allgraph nodes andarcsareunlabeled,and graphsmatchifthereisa1:1andontocorrespondencebetweenthearcsandnodes ofthe twographs.The lackofexpressive powerinthesegraphsandtherequire mentthatamatchbe"perfect" limitstheusefulness ofthispuremodelofmatch inginthecontext ofnoisyinputandimprecise reference structures. Inpractice, graph nodes mayhave properties with continuous rangesofvalues,andanarbi trarilycomplexalgorithmdetermineswhethernodesorarcsmatch.Thealgorithm mayevenaccessinformation outsidethegraphsthemselves,aslongasitreturns theanswer "match"or"nomatch."Generalizing thegraphtheoreticnotionsin thiswaycanobscure issuesoftheir efficiency, power, andproperties;onemust steeracoursebetween the"elegantandunusable"andthe"general anduncon trollable." This section introduces some "pure" graphtheoretic algorithms that formthebasisfortechniquesinSections11.3and11.4.
11.2.1 TheAlgorithms

The following areseveral definitions ofmatching between graphs [Harary 1969; Berge1976]. Graphisomorphism. Given twographs (V\, E\)and(V2, E2),finda1:1and onto mapping (an isomorphism) / between V\ and V2 such that for V\,v2 V\, V2,f(v\) =v2andforeach edgeofE\ connecting anypairof nodesviandv'i V\,thereisanedgeofE2connectingf(v\) and f(y\). Subgraphisomorphism. Findisomorphismsbetweenagraph(V\i E\)andsub graphsofanothergraph (V2i E2).Thisiscomputationally harderthan isomor phismbecauseonedoesnotknowinadvancewhichsubsetsofV2areinvolved inisomorphisms. "Double"subgraphisomorphisms. Findallisomorphismsbetweensubgraphsof agraph (V\i E\)andsubgraphsofanothergraph (V2i E2).Thissoundsharder thanthesubgraphisomorphismproblem,butisequivalent. Amatchmaynotconform tostrictrulesofcorrespondence betweenarcsand nodes (some nodes andarcs maybe"unimportant"). Such amatching cri terionmaywellbeimplementedasa"computational" (impure)versionofone ofthepuregraphisomorphisms.
Sec. 11.2 GraphTheoreticAlgorithms 357

Figure11.4showsexamplesofthesekindsofmatches. Onealgorithm for finding graph isomorphism [Corneil andGotlieb 1970]is based on the idea of separately putting each graph into acanonical form, from whichisomorphism mayeasilybedetermined. Fordirected graphs (i.e.,nonsym metric relations) a backtrack search algorithm [Berztiss 1973] works on both graphsatonce. Two solutions to the subgraph isomorphism problem appear in [Ullman 1976]:The first isasimple enumerative search of the tree of possible matches between nodes. The second is more interesting; in it a process of "parallel iterative"refinement isappliedateachstageofthesearch.Thisprocessisawayof rejecting node pairsfrom theisomorphism and ofpropagating theeffects ofsuch rejections;onerejected matchcanleadtomorematchesbeingrejected. When the iteration converges (i.e., when no more matches can be rejected at the current stage),anotherstepinthetreesearchisperformed (onemorematchingpairishy pothesized).Thismixingofparalleliterativeprocesseswithtreesearchisuseful in avarietyofapplications (Section11.4.4,Chapter12). "Double"subgraphisomorphismiseasilyreducedtosubgraph isomorphism viaanotherwellknowngraphproblem,the"cliqueproblem."Acliqueofsize./Vis V(eachnodeisconnectedtoeveryothernode atotallyconnectedsubgraphofsizeT in the clique byanarc).Finding isomorphisms between subgraphs ofagraphA andsubgraphsofagraphBisaccomplishedbyforminganassociationgraphGfrom thegraphsAandBandfindingcliquesinG(fordetails,seeSection 11.3.3).Clique

(b)

(c)

(d)

(e)

Fig. 11.4 Isomorphisms and matches.Thegraph (a) hasan isomorphism with (b),varioussubgraphisomorphismswith (c),andseveral"double"subgraphiso morphismswith (d). Severalpartialmatcheswith (e) (andalso(b),(c),and(d)), dependingonwhichmissingorextranodesareignored.
358 Ch.11 Matching

finding maybedonewithasubgraphisomorphismalgorithm;hencethereduction. Several other cliquefinding algorithms exist [Ambler et al. 1975;Knodel 1968; BronandKerbosch 1973;OsteenandTou1973].

11.2.2 Complexity

Itisofsomepractical importance tobeawareofthecomputational complexityof thematchingalgorithmsproposedhere;theymaytakesurprisingamountsofcom putertime. Therearemanyaccessibletreatmentsofcomputationalcomplexityof graphtheoretic algorithms [Reingold et al. 1977; Aho, Hopcroft and Ullman 1974].Theoreticalresultsusuallydescribeworstcaseoraveragetimecomplexity. The state of knowledge in graph algorithms isstill improving; some interesting worstcaseboundshavenotbeenestablished. A"hard" combinatorial problemisonethattakestime (inausualmodelof computation based onaserialcomputer) proportional toanexponential function ofthelengthoftheinput."Polynomialtime"solutionsaredesirablebecausethey donotgrowasfastwiththesizeoftheproblem. Thetimetofindallthecliquesofa graphisintheworstcaseinherentlyexponentialinthesizeoftheinputgraphs,be causetheoutputisanexponentialnumberofgraphs.Boththesinglesubgraphiso morphismproblemandthe"cliqueproblem" (doesthereexistacliqueofsize?) areNPcomplete;allknowndeterministicalgorithmsrun (intheworstcase)intime exponential in the length of the description ofthe graphs involved (which must specify thenodesandarcs).Notonlythis,butifeitheroftheseproblems (orahost ofotherNPcompleteproblems) couldbesolveddeterministically intimepolyno mial^relatedtothelengthoftheinput,itcouldbeusedtosolvealltheotherNP problemsinpolynomialtime. Graph isomorphism, both directed and undirected, is at this writing in a netherworld (along with many other combinatorial problems). No polynomial time deterministic algorithms areknown toexist, but the relation oftheseprob lemstoeachotherisnotasclearcutasitisbetweentheNPcompleteproblem.In particular,findingapolynomialtime deterministicsolution tooneofthemwould notnecessarilyindicateanythingabouthowtosolvetheotherproblemsdetermin istically inpolynomial time.These problems arenot mutually reducible. Certain restrictionson thegraphs, for instancethattheyareplanar (canbearrangedwith theirnodesinaplaneandwithnoarcscrossing),canmakegraphisomorphism an "easy" (polynomialtime) problem. Theaveragecasecomplexityisoftenofmorepracticalinterestthantheworst case.Typically,suchameasureisimpossibletodetermineanalyticallyandmustbe approximated through simulation. For instance, one algorithm to find isomor phisms ofrandomly generated graphs yields an average time that seems not ex ponential,butproportional toN3 ,with/Vthenumber ofnodesinthegraph [Ull man 1976].Another algorithm seems to run in averagetime proportional to N2 [CorneilandGotlieb1970]. Allthegraph problemsofthissectionareinNP.Thatis,a odeterministic algorithmcansolvetheminpolynomialtime.Therearevariouswaysofvisualizing
Sec. 77.2 GraphTheoretic Algorithms

359

nondeterministic algorithms; one is that the algorithm makes certain significant "goodguesses"fromarangeofpossibilities (suchascorrectlyguessingwhichsub set of nodesfrom graph Bare isomorphic withgraph Aand then only having to worry about the arcs). Another way is to imagine parallelcomputation; in the cliqueproblem, forexample, imaginemultiplemachinesrunninginparallel,each withadifferent subset ofnodesfrom the inputgraph.Ifanymachinediscoversa totally connected subset, ithas,ofcourse,discoveredaclique.Checking whether T Vnodesareallpairwiseconnectedisatmostapolynomialtimeproblem,soallthe machineswillterminateinpolynomialtime,eitherwithsuccessornot.Severalin terestingprocessescanbeimplementedwithparallelcomputations.Ullman'salgo rithm usesarefinement procedurewhichmayruninparallelbetweenstagesofhis treesearch,andwhichheexplainshowtoimplementinparallelhardware [Ullman 1976].

11.3 IMPLEMENTING GRAPHTHEORETIC ALGORITHMS 11.3.1 MatchingMetrics

Matching involves quantifiable similarities. A match is not merely a correspon dence,butacorrespondencethathasbeenquantified accordingtoits"goodness." This measure ofgoodness isthe matchingmetric. Similarity measures for correla tion matching are lumped together as one number. In relational matching they must takeintoaccount arelational, structured form ofdata [ShapiroandHaralick 1979]. Most ofthe structural matching metrics may beexplained withthe physical analogy of"templates and springs" [Fischler and Elschlager 1973].Imagine that thereference datacompriseastructureonatransparentrubbersheet.Thematch ingprocessmovesthissheet overtheinputdatastructure, distortingthesheetso as to get the best match. The final goodness of fit depends on the individual matchesbetweenelementsoftheinputandreference data,andontheamountof work ittakestodistort thesheet. Thecontinuousdeformation processisapretty abstractionwhichmostmatchingalgorithmsdonotimplement.Acomputationally more tractable form of the idea is to consider the model asaset of rigid "tem plates" connected by "springs" (see Fig. 11.5).The templatesareconnected by "springs"whose"tension"isalsoafunction oftherelationsbetweenelements.A spring function can be arbitrarily complex and nonlinear; for example the "ten sion"inthespringcanattainveryhighorinfinitevaluesforconfigurationsoftem plateswhichcannot beallowed. Nonlinearity isgood for suchconstraints as:ina pictureofafacethe twoeyesmust beessentially inahorizontal lineand mustbe withinfixedlimitsofdistance. Thequalityofthematchisafunction ofthegood nessoffitofthe templates locallyand theamount of"energy" needed tostretch the springstoforce the input onto the reference data.Costsmay beimposed for missingorextraelements. The template match functions and spring functions aregeneral procedures, thus the templates may be more general than pure iconic templates. Further,
360 Ch.11 Matching

Right edge

Mouth Fig. 11.5 Atemplatesand springsmodel ofaface.

matches may be defined not only between nodes and other nodes, but between nodesand imagedata directly.Thus the templateand springsformalism iswork ablefor"crossrepresentational" matching. Themechanismofminimizingtheto talcostofthematchcantakeseveralforms;moredetailedexamplesfollowinSec tion11.4. Equation 11.3 a general form of the templateandsprings metric. Tem plateCost measures dissimilarity between the input and the templates, and SpringCost measuresthedissimilarity betweenthematched inputelements'rela tionsandthereference relationsbetweenthetemplates.MissingCostmeasuresthe penaltiesformissingelements.F()isthemappingfrom templatesofthereference toelementsoftheinputdata.Fpartitionsthereference templatesintotwoclasses, those found {FoundinRefer} and those not found {MissinginReferj in the input data.Ifthe inputdataaresymbolictheymaybesimilarlypartitioned.Thegeneral metricis Cost= + + L L I
d 6 {FoundinRefer)

TemplateCostU Fid)) SpringCost(Fid), Fie)) MissingCost (c) (11.3)

id,e) 6 {FoundinRefer x Foundinlnputl c {MissinginReferl U {Missinginlnputj

Equation 11.3maybewrittenasonesumofgeneralizedSpringCostsinwhich thetemplatepropertiesareincluded (as1ary relations),asare"springs"involv ingmissingelements.


Sec. 11.3 Implementing GraphTheoretic Algorithms 361

As with correlation metrics, there are normalization issues involved with structural matchingmetrics.Thenumberofelementsmatched mayaffect theulti mate magnitude of the metric. For instance, ifspringsalwayshave afinitecost, then themoreelementsthatarematched, thehigherthetotalspringenergy must be; this should probably not betaken to imply thatamatch ofmany elementsis worse than amatch ofafew. Conversely, suppose that relations whichagree are givenpositive "goodness"measures,andamatchischosenonthebasisoftheto tal"goodness."Thenunlessoneiscareful, thesheernumberofpossiblymediocre relationalmatchesinduced bymatchingmanyelementsmayoutweighthe "good ness" ofan elegant match involving only afew elements. On the other hand,a small,elegant match ofapartoftheinput structurewithoneparticular reference object may leave much of the search structure unexplained. This good "sub match"maybelesshelpful thanamatchthatexplainsmoreoftheinput.Tosome extent thegeneral metric (Eq.11.3) copeswith thisbyacknowledging the "miss ing"categoryofelements. Ifthereference templatesactuallycontain iconicrepresentations ofwhatthe input elements should look like in the image, aTemplateCost can be associated withatemplateandalocationintheimageby TemplateCost(Template, Location) = (1 normalized correlation metric between template shapeand input image at thelocation). Ifthematch is,for instance, tomatchreference descriptions ofachairwith aninputdatastructure,atypical"spring"mightbethatthechairseatmustbesup ported byitslegs.Thusif./ustheassociationfunction mappingreference elements suchasLEGorTABLETOPtoinputelements, SpringCost!(/"(LEG),/"(TABLETOP) _ JO 1 ifF(LEG) appearstosupport F(TABLETOP), ifF(LEG) doesnotappeartosupport F(TABLETOP).

Forquantified relations,onemighthave SpringCost2 = number ofstandard deviations from the canonical mean valuefor thisrelation. Another version of SpringCost2 is the following [Barrow and Popplestone 1971]. P . o M . , = SpringCostsofproperties (unary)andbinaryrelations /,, .x totalnumberofunaryandbinarysprings , EmpiricalConstant Totalnumberofreferenceelementsmatched Thefirsttermmeasures theaveragebadness ofmatches between properties (unaryrelations) andrelationsbetweenregions.Thesecondtermisinverselypro portionaltothenumberofregionsthatarematched,effectively increasingthecost ofmatchesthatexplainlessoftheinput.
362
Ch.77 Matching

11.3.2 Backtrack Search

Backtrack searchisagenericnamefor atypeofpotentially exhaustivesearch or ganized in stages; each processing stage attempts to extend a partial solution derived inthepreviousstage.Should theattemptfail, thesearch "backtracks"to the most recent partial solution, from which anew extension isattempted. The techniqueisbasic,amountingtoadepthfirst searchthrough atreeofpartialsolu tions (Fig. 11.6).Backtracking isapervasive control structure inartificial intelli

Fig. 11.6 Thegraphof (a)istobematched in (b)witharcsallbeing unlabeled but nodeshavingproperties indicated bytheirshapes, (c) isthetreeofsolutions builtbyabacktrackalgorithm.
Implementing GraphTheoretic Algorithms

gence,andthroughtheyearsseveralgeneralclassesoftechniqueshaveevolvedto makethebasic,bruteforcebacktracksearchmore efficient. Example:GraphIsomorphisms Giventwographs, * (VX,EX) Y= (Vy, Ey), withoutlossofgenerality,let Vx = VY {1,2 , . . . , },andletXbe the reference graph, Ythe inputgraph.Theisomorphism isgivenby:If/ Vx, thecorrespond ingnodeundertheisomorphismisF(i) VY. Inthealgorithm,Sisthesetofnodesaccountedforin Ybyapartialsolution. Ogivesthecurrentlevelofthesearchinthetreeofpartialsolutions,thenumberof nodes in the current partial solution, and the node of X whose match in Yis currently being sought, visanode of Ycurrently beingconsidered toextend the current partial solution. As written, the algorithm finds all isomorphisms. It is easilymodifiedtoquitafterfindingthefirst.

Algorithm11.1 BacktrackSearchforDirectedGraphIsomorphism RecursiveProcedure DirectedGraphlsomorphisms (S,k); begin if S=VY then ReportAsIsomorphism(F) else forallvt (VyS) do //Match(A:,v) then begin F(k):=v; DirectedGraphlsomorphisms ( {v},A:rl); end; end,

ReportAsIsomorphism couldprintorsavethecurrent valueofF,theglobal structure recording the current solution. Match(/c,v) is a procedure that tests whether v Vycancorrespondtok Vxundertheisomorphismsofardefinedby F.LetXk bethesubgraphofA'withvertices {1,2,...,k).Theprocedure "Match" must checkfor i<k,whether (/, k) isanedgeofXk iff (F(i), v) isanedgeofY T and whether (k, i)isanedgeofA *iff(v,F(i)) isanedgeofY. ImprovingBacktrackSearch Several techniquesareuseful inimprovingtheefficiency ofbacktrack search [BittnerandReingold1975]:
364
Ch.11 Matching

1. Branchpruning.Alltechniquesofthisvarietyexaminethecurrentpartialsolu tionandpruneawaydescendentsthatarenotviablecontinuationsofthesolu tion.Shouldnoneexist,backtrackingcantakeplaceimmediately. 2. Branch merging. Donotsearch branches ofthe solution treeisomorphicwith thosealreadysearched. 3. Treerearrangementandreordering.Givenpruningcapabilities,morenodesare likelyto beeliminated bypruning ifthere arefewer choicestomakeearlyin the search (partial solution nodesoflowdegreeshould behigh in thesearch tree).Similarly,searchfirst thoseextensionstothecurrentsolution thathave thefewestalternatives. 4. Branchandbound. Ifacost maybeassignedtosolutions,standard techniques suchasheuristicsearch and theA*searchalgorithm [Nilsson 1980] (Section 4.4) maybeemployed toallowthesearch toproceed ona"bestfirst" rather thana"depthfirst" basis. Forextensionsofthesetechniques,see[HaralickandElliott1979]. 11.3.3 AssociationGraphTechniques GeneralizedStructureMatching Ageneral relationalstructure "best match"islessrestricted thangraphiso morphism, because nodes or arcs may be missing from one or the other graph. Also,itismoregeneralthansubgraphisomorphismbecauseonestructuremaynot beexactlyisomorphic toasubstructure oftheother. Amoregeneral matchcon sistsofasetofnodesfrom onestructureandasetofnodesfromtheotheranda1:1 mappingbetweenthemwhichpreservesthecompatibilitiesofpropertiesandrela tions. In other words, corresponding nodes (under the node mapping) have sufficiently similar properties, and corresponding sets under the mapping have compatiblerelations. The tworelational structuresmayhaveacomplexmakeup that falls outside the normal purview ofgraph theory. For instance, they may have parameterized properties attached to their nodes and edges. The definition of whether a node matchesanother nodeandwhether twosuchnodematchesaremutuallycompati ble can bedetermined byarbitrary procedures, unlike the much simpler criteria used in puregraph isomorphism or subgraph isomorphism, for example. Recall thatthevariousgraphandsubgraph isomorphisms relyheavily ona1:1match,at leastlocally,betweenarcsandnodesofthestructurestobematched.However,the idea of a "best match" may make sense even in the absence of such perfect correspondences. Theassociationgraphdefined inthissectionisanauxiliarydatastructurepro ducedfrom tworelationalstructurestobematched.Thebeautyoftheassociation graph is that it is asimple, pure graphtheoretic structure which isamenable to puregraphtheoretic algorithms such ascliquefinding.This isuseful for several reasons.

Sec. 77.3

Implementing GraphTheoretic

Algorithms

365

Ittakesrelationalstructurematchingfrom theadhoctotheclassicaldomain. It broadens the base ofpeople who areproducing useful algorithms for struc turematching.Iftheratherspecializedrelationalstructurematching enterprise isreducibletoaclassicalgraphtheoretical problem, theneveryoneworkingon theclassicalproblemisalsoworkingindirectlyonstructurematching. Knowledgeaboutthecomputationalcomplexityofclassicalgraphalgorithmsil luminatesthedifficulty ofstructurematching.
CliqueFindingfor Generalized Matching

Letarelationalstructurebeasetofelements V,asetofproperties (ormore simplyunarypredicates) Pdefined overtheelements,andasetofbinaryrelations (orbinarypredicates) Rdefined overpairsoftheelements.Anexampleofagraph representationofsuchastructureisgiveninFig.11.7. Given twostructuresdefined by (V\, P,R) and (V2, P,R), saythat "simi lar" and "compatible"actually mean "the same."Then weconstruct anassocia tiongraph Gasfollows [Ambler etal. 1975].Foreach v\in V\and v2in V2,con struct anodeof Glabeled (vi, v2)if viand v2havethesameproperties [p(v\) iff p(v2) foreachpinPi.ThusthenodesofGdenoteassignments,orpairsofnodes, oneeach from V\and V2,which havesimilarproperties.Nowconnect twonodes (vj, v2)and (v'i, v'2)ofGiftheyrepresentcompatibleassignmentsaccordingto R, that is, ifthe pairssatisfy the same binary predicates [r(v\, v\) iffr(v2, v'2) for each r'mR]. Amatchbetween (F lf P,R) and(V2, P,R), thetworelationalstructures,is justasetofassignmentsthatareallmutuallycompatible.The"bestmatch"could well be taken to be the largest set of assignments (node correspondences) that wereallmutuallycompatibleunder therelations.Butthisintheassociationgraph Gisjust the largest totally connected (completely mutually compatible)set of nodes.Itisaclique. Acliquetowhichnonewnodesmaybeaddedwithoutdestroy ingthecliquepropertiesisamaximalclique.Inthisformulation ofmatching,larger cliquesare taken toindicate better matches, since they account for more nodes.

F I R . 11.7 A graph representation o fa relational structure. Elements (nodes) v\ and V3have property p i , 12and i>4have property p2,and the arcsbetween nodes indicate that the relation r l holds between i>|and i>2and between v2 and vh and r2holds between v3and v4 and between v 4 and v j .

366

Ch. 11

Matching

Thusthebestmatchesaredeterminedbythelargestmaximalcliquesintheassoci ationgraph.Figure 11.8showsanexample:Certainsubfeatures oftheobjectshave been selected as "primitive elements" of the objects, and appear asnodes (ele ments) in the relational structures. To these nodes are attached properties, and between them can exist relations. The choice ofprimitives, properties, and rela tions is up to the designer of the representation. Here the primitives of the representationcorrespondtoedgesandcornersoftheshape. The association graph is shown in 11.8e. Its nodes correspond to pairs of nodes,oneeachfrom Aand B,whosepropertiesaresimilar. [Noticethatthereis nonodeintheassociationgraphfor (6,6')].Thearcsoftheassociationgraphindi cate that the endpoints of the arc represent compatible associations. Maximal cliquesintheassociationgraph (shownassetsofnodeswiththesameshape)indi catesetsofconsistent associations.The largest maximal cliqueprovidesthe node pairingsofthe"best match." In theexample construction, the association graph isformed byassociating nodeswithexactlythesameproperties (actuallyunarypredicates),andbyallowing as compatible associations only those with exactly the same relations (actually binarypredicates).Theseconditionsareeasytostate,buttheymaynotbeexactly whatisneeded. Inparticular, ifthepropertiesandrelationsmaytakeonrangesof valuesgreater than the binary "exists" and "does notexist," then ameasureof similarity must beintroduced todefine when node propertiesaresimilar enough forassociation,and~whenrelationsare"similarenough~for ccjnpaTibilityTArbitrarily complexfunctions candecidewhetherpropertiesandrelationsaresimilar.Aslong asthefunction answers"yes"or "no," thecomplexity ofitscomputations isir relevanttothematchingalgorithm. Thefollowing recursivecliquefinding algorithm buildsupcliquesanodeata time [Ambleretal. 1975].The search treeitgenerateshasstatesthatareordered pairs (set ofnodeschosen for aclique,set ofnodesavailable for inclusion in the clique).Therootofthetreeisthestate (0,allgraphnodes),andateachbrancha choiceismadewhether toincludeornottoincludeaneligiblenodeintheclique. (IfanodeiseligibleforinclusionincliqueX,theneachcliqueincludingA"mustei therincludethenodeorexcludeit). Algorithm11.2: CliqueFinding Algorithm CommentNodesisthesetofnodesintheinputgraph. Comment Cliques(X,Y) takesasargumentsacliqueX,and F,asetofnodesthatincludes X. ItreturnsallcliquesthatincludeJand areincludedinY. Cliques(0,Nodes)findsallcliquesinthegraph. CliquesiXJ) := // nonodein YX\sconnectedtoallelementsofX then{X\ else CliquesUriJ {y],V)\J Cliques (X,Y{y}) whereyisconnectedtoallelementsofX.
11.3 Implementing GraphTheoretic Algorithms

367

1'

) Property "comer"

Property "short"

Relation "next"

Fig. 11.8 Cliquefinding example.Entitiestobematchedaregivenin(a) (refer ence)and (b)(input). Therelationalstructurescorrespondingtothemareshown in (c) and (d). The resulting association graph isshown in (e) with its largest cliquesindicated bynodeshapes.
Ch.77 Matching

Extensions Modifications to the cliquefinding algorithm extend it tofindingmaximal cliquesandfindinglargest cliques.Tofindlargest cliques, perform an additional test to stop the recursion in Cliquesifthe sizeofXplus the number ofnodesin YXconnected toallofXbecomeslessthan k,whichisinitiallysettothesizeof thelargest possibleclique.Ifnocliquesofsizekarefound, decrement kandrun Cliqueswiththenewk. Tofindmaximalcliques,ateachstageofCliques,computetheset Y'= {z Nodes:z isconnectedtoeachnodeofY}. Sinceanymaximalcliquemust include Y',searchingabranchmaybe terminated should Y'notbecontainedin F,since Fcanthencontainnomaximalcliques. The association graph may besearched not for cliques, but for /connected components. An rconnectedcomponent isasetof nodes such that each node is connected to at least rother nodes of the set. A clique of size nis an n1 connectedcomponent.Fig.11.9showssomeexamples. Therconnectedcomponentsgeneralizethenotionofclique.Anrconnected component of N nodes in the association graph indicates a match of N pairsof nodesfromtheinputandreferencestructures,asdoesanNclique. Eachmatching pairhassimilarproperties,andeachpairiscompatiblewithatleastrothermatches inthecomponent. Whether or not the rconnected component definition ofamatch between two structures is useful depends on the semantics of "compatibility." For in stance,ifallrelationsareeithercompulsory orprohibited,clearlyacliqueiscalled for. If the relations merely give some degree of mutual support, perhaps an r connectedcomponentisthebetterdefinition ofamatch.

11.4 MATCHING IN PRACTICE

Thissectionillustratessomeprinciplesofmatchingwithexamplesfrom thecom putervisionliterature.

Fig. 11.9 /connected components, (a) A 5cIique (which is4connected). (b) A 3connected setof5nodes, (c) A 1connected setof5nodes.
Sec. 77.4 Matching in Practice

369

11.4.1 DecisionTrees

Hierarchicaldecisiontreematchingwithadhocmetricsisapopularwaytoidentify input datastructures asinstances ofreference modelsand thusclassify the input instances [Nevatia1974;Ambleretal. 1975;Winston1975]. Decisiontreesarein dicatedwhenitispredictablethatcertainfeaturesaremorereliablyextracted than othersandthatcertainrelationsareeithereasiertosenseormorenecessarytothe successofamatch. Winston and Nevatia both compare matches with a "weighted sums of difference" metricthatreflects cumulativedifferences between theparametersof corresponding elementsandrelationsinthereference andinputdata.Inaddition, Nevatiadoesparameterfitting; hisreference information includesgeometricalin formation.
Matching Structural Descriptions

Winston isinterested in matching such structures asappear inFig. 11.1OB. The idea is to recognize instances of structural concepts such as "arch" or "house,"whicharerelationalstructures ofprimitive blocks (Fig.11.10A) [Wins ton 1975].Animportantpartoftheprogramlearnstheconceptinthefirst place only the matching aspect of the program is discussed here. His system has the pleasant property ofrepresentational uniqueness: reference and input data struc tures thatareidenticaluptotheresolution ofthedescriptorsusedbytheprogram haveidentical representations. Matching iseasyinthiscase.Reflections ofblock structures can be recognized because the information available about relations (such asLEFTOFand INFRONTOF) includestheir OPPOSITE (i.e.,RIGHT OFandBEHIND).Theprogramthuscanrecognizevarioussortsofsymmetryby replacingallinputdatastructurerelationsbytheirrelevantopposite,thencompar ingwiththereference. Thenext mostcomplicated matching task after exactorsymmetric matches istomatchstructuresinisolation.Herethemethodissequentiallytomatchthein putdataagainstthewholereferencedatacatalogofstructuresanddeterminewhich match isbest (whichdifference description ismostinconsequential). Easilycom putedscenecharacteristicscanruleoutsomecandidatemodelsimmediately. ThemodelscontainarcssuchasMUSTBEandMUSTNOT,expressingre lations mandatory or forbidden relations. A match is not allowed between a description and amodel ifoneofthe strictures isviolated. For instance,thepro gram may reject a "house" immediately as not being a "pedestal," "tent," or "arch," since the pedestal top must be aparallelepiped, both tent components must bewedges, and the house ismissing acomponent tosupport the top piece thatisneededinthearch.Theseoutrightrejectionsareinasenseeasycases;itcan alsohappenthatmorethanonemodelmatchessomescenedescription.Todeter minethebestmatchinthiscase,aweightedsumofdifferences ismadetoexpress theamountofdifference. The next harder case isto match structures in acomplex scene.The issue hereiswhattodoaboutevidencethatismissingthroughobscuration.Twoheuris ticshelp:
370 Ch.77 Matching

Arch

Near miss

a
Z?
Near miss Arch

Fig. 11.10 (a) Several archesand nonarches, (b)The computergenerated arch description tobe used for matching.

1. Objects thatseem tohavebeenstackedandcouldbethesametypeareofthe sametype. 2. Essential model properties may behidden inthe scene, sothe match should notbeaborted becauseofmissingessentialproperties (thoughthepresenceof forbidden propertiesisenoughtoabortamatch). Thislatter ruleisequivalent toNevatia's rulesabout connectivity difference and missinginput instance parts (seebelow).Intermsofthegeneralstructure metric introduced earlier,neitherWinston orNevatiapenalizethematchformissingele mentsorrelationsinthereference data.Oneresultofthisisthatthebestmatchis sometimesmissedinfavorofthefirstpossiblematch. Winstonsuggeststhatcom
Matching in Practice 371

plexscenesbeanalyzedbyidentifying subscenesandsubtractingoutthe identified parts,aswasdonebyRoberts. Winston'sprogramcanlearnshortcutsinmatchingstrategybyitself;itbuilds for itself a similarity network relating models whose differences are slight. Ifa referencemodeldoesnotquitefitaninputstructure,theprogramcanmakeanin telligent choice ofthe next model totry.Agood choiceisamodel that hasonly minordifferences withthefirst.Thisselforganization andcatalogingofthemodels accordingtotheirmutualdifferences isapowerfulwaytousematchingworkthatis alreadyperformed toguidefurther searchforagoodmatch. BacktrackSearch Nevatia addresses a domain of complex articulated biologicallike forms (hands, horses, dolls, snakes) [Nevatia 1974].Hisstrategy istosegment theob jects into parts with central axes and "cross section" (not unlike generalized cylinders, except that they are largely treated in two dimensions). The derived descriptionsofobjectscontaintheconnectivityofsubparts,anddescriptionsofthe shapeandjoint typesoftheparts.Matchingisneededtocomparedescriptionsand finddifferences, which can then beexplained or used toabort the match. Partial matches are important (as in most realworld domains) because of occlusions, noise,andsoon. Aprioriideasasto therelative importanceofdifferent aspectsofstructures areusedtoimposeahierarchicalorderonthematchingdecisiontree.Nevatiafinds thisheuristicapproach moreappealing thanauniform, domainindependent one suchascliquefinding.Hissystemknowsthatcertainpartsofastructurearemore importantthanothers,andusesthemtoindexintothereference datacatalogcon taining known structures. Thus relevant models for matching may be retrieved efficiently onthebasisofeasilycomputed functions oftheinputdata.Themodels aregeneratedbythemachinebythesameprocessthatlaterextractsdescriptionsof theimageforrecognition.Severaldifferent modelsarestoredforthesameviewof thesameobject, becausehisprogramhasnoideaofmodelequivalence,andcan notalwaysextractthesamedescription. Thematchingprocessisbasicallyadepthfirst treesearch,withinitialchoices beingconstrained by"distinguished pieces."Theseareimportant piecesofimage whichfirstdictatethemodelstobetriedinthematch,andthenconstrainthepos sibleothermatchesofotherparts. Thereisatopological andageometricalcomponent tothematch.Thetopo logical part is based on the connectivity of the "stick figure" that underlies the representation.Thegeometricalpartmatchesthemorelocalcharacteristicsofindi vidualpieces.ConsiderNevatia'sexampleofmatchingadollwithstored reference descriptions,includingthoseofadollandahorse. Byaprocessnotconcerningushere,thedollimageissegmentedintopieces asshowninFig.11.11.Fromthis,beforeanymatchingisdone,aconnectiongraph ofpiecesisformed,asshowninFig.11.12. Thisconnection graph istopologically thesameasthereference connection graphfor thedoll,which looksasonewouldexpect.Inboth reference and input, "distinguished pieces"areidentified bytheir largesize.During reference forma
372 Ch.11 Matching

Fig. 11.11 Aviewofadoll,withderivedstructure.

tion time, the twolargest pieceswere the headand the trunk, and these are the distinguished pieces in the reference. There are the same pieces picked as distinguished in the instance to be matched consistent with the hierarchical decisiontreestyle,distinguishedpiecesarematchedfirst. Becauseofnoise,connectionsatjointsmaybemissed;becauseofthenature of the objects, extrajoints are hardly ever produced. Thus there is a domain dependentrulethataninputpiecewithtwootherpiecesconnectedatasinglejoint (a "twoended piece") cannot match aoneended reference piece,although the reverseispossible. On the basisofthe distinguished piecesinthe input instance, the program decides that the instance could be adoll or ahorse. Both these possibilities are evaluatedcarefully; Fig.11.13showsaschematicviewoftheprocess.Piecematch evaluationmustbeperformed atthenodesofthetreetodeterminewhichpiecesat ajointshouldbemadetocorrespond. Thefinalbestmatchbetweenthedollinputandthehorsereferencemodelis diagrammedinFig.11.14.Thismatchisnotasgoodasthematchbetweenthedoll inputandthedollreference.
A

L Sec.11.4 Matchingin Practice

Fig. 11.12 Connectiongraphofthe doll.


373

Thefinalchoice ofmatches ismadewithaversion ofthegeneral relational structure matchingmetric (Eq. 11.3).It takesintoaccounttheconnectivity rela tions, which are the same in this case, and the quality of the individual piece matches. In thedollhorse match, morereference partsaremissing, but thiscan happen ifparts are hidden in aview, and do not count against the match. The dolldoll match ispreferred on thebasisofpiecematching, but both matchesare consideredpossible. Insummary,theselection ofbestmatchproceedsroughlyasfollows:unac ceptabledifferences arefirstsought (notunliketheWinstonsystem).Thenumber of input pieces not matched by the reference isan important number (not vice versa,becauseofthepossibilityofhiddeninputparts).Onlyelongated,largeparts

* A : <

+0:0 +O:A
(sameas leftmost path)

I
I 1 3Ai'
(extra input piecematches unmatched referencearm)

*A:A

K
2 ' ( ^ 3 '

+A'V

2' 2 <3 (nomatches now for instanceleg) (head(4): leg(4') matchvery poor)

3'

(legmatched despite shadows)

(bothbrancheslead tocorrect match)

Fig. 11.13 A pictorial guide to thecombinations tried by the matcher establishing the best correspondence o f the doll input with the doll reference. The graphic shapes are purely pedagogical: the program deals with symbolic connectivity information and geometric meas urements. Inferences and discoveries made by the program while matching aregiven in the diagram. A:B means that structure A is matched with structure B, with the numbered sub structureso f A matching their primed counterparts in B. 3 7 4 Ch. 11 Matching

Input

R e,e re ce

n J L
Fig. 11.14 Thebest match ofthedoll input withthehorse reference model.One dollarmisunmatched, asisthehorse head andtwo legs.

are considered for this determination, toeliminate small "noise" patches.The matchwithfewestunmatchedinputpiecesischosen. Ifnodecidingstructuraldifferences exist,thequalityofpiecematchesdeter minesthequalityofthematch.ThesecorrespondtothetemplatecostterminEq. (11.3).Ifa"significant" difference inmatch error exists,thebetter matchisex clusively selected; ifthedifference isnotsogreat asthat, thebetter match is merely preferred. Piece matching isa subprocess ofjoint matching. Thedifference inthe numberofpiecesattachedattheendsofthepiecetobematched istheconnectivity difference.Iftheobjectpiecehasmorepiecesconnectedtoitthanthemodelpiece, the matchisapoorone;sincepiecesmaynotbevisibleinaview,theconverseis nottrue.Ifonematchgivesfewerexcessinputpieces,itisacceptedatthispoint.If not,thegoodnessofthematchiscomputedasaweightedsumofwidth difference, lengthtowidth ratio difference, anddifference inacuteness ofthegeneralized cylinders (Chapter9)forming thepieces.Theweightedsumisthresholdedtoyield afinal"yesorno" matchresult.Shadowingphenomenaareaccommodated byal lowing theinput piece tobenarrower than thereference model piece withno penalty.Theerrorfunction weightsarederivedempirically;onewould notexpect theviewingangletoaffect seriouslythewidthofapiece,forexample,butitcould affect itslength.Pieceaxisshapes (whatsortofspacecurvetheyare) arenotused fordomaindependent reasons,norarecrosssectionfunctions (asidefromameas ureof"acuteness"forconelikegeneralizedcylinders).
11.4.2 DecisionTreeandSubgraph Isomorphism

Arobotics programforversatile assembly [Ambleretal. 1975] uses matchingto identify individual objects onthebasisoftheir boundaries, andtomatch several individualblobsonascreenwithareference modelcontainingtheknownlocation of multiple objects inthefieldofview.Inboth cases thebest subgraph isomor phismbetweeninputandreference datastructuresisfound whennecessarybythe cliquefinding technique (Algorithm11.2).
Sec. 77.4 Matching in Practice 3 7 5

Theinputdatatothepartrecognizerconsistofsilhouettesofpartswithout linesofpiecewiselinearandcircularsegments.Atypicalsetofshapestoberecog nized might be stored in terms of boundary primitives asshown in Fig. 11.15a, withmatchableandunmatchablescenesshowninFig.11.15b. Generally, thematchingprocessworksonhierarchicalstructureswhichcap tureincreasinglevelsofdetailabouttheobjectsofinterest.Thematchingworksits waydownthehierarchy,fromhighlevel,easilycomputablepropertiessuchassize down todifficult properties such asthearrangements oflinearsegmentsinapart outline.After thisdecisiontreepreprocessing,allpossiblematchesarecomputed bythecliquefinding approachtosubgraph isomorphism.Ascenecanbeassigned a number ofinterpretations, including those ofdifferent views ofthe same part. The hierarchical organization means thatcomplicated properties ofthe sceneare not computed unless they are needed by the matcher. Once computed they are never recomputed, since they are stored inaccessible placesfor later retrieval if needed. Each matching level produces multiple interpretations; ambiguity is treated with backtracking.The system recognizes rotational and translational in variance,butmustbetaughtdifferent viewsofthesameobjectinitsdifferent grav itationallystablestates.Ittreatsthesedifferent statesbasicallyasdifferent objects.
11.4.3 Informal Feature Classification

Thedomainofthisworkisoneofsmall,curvedtabletopobjects,suchasateacup (Fig. 11.16) [Barrowand Popplestone 1971].Theprimitivesinmodelsandimage descriptionsareregionswhicharefound byaprocessirrelevanthere.Theregions have certain properties (such as shape or brightness), and they have certain parameterized relations withother regions (suchasdistance,adjacency, "above ness").Theinputandreference dataarebothrelationalstructures.Theproperties andrelationsarethefollowing:

0
(w
3 7 6 Fig. 11.15 A small catalogo f part boundaries (a) and some sample silhouettes (b).The " h e a p " will not match any part very well, while the squarecan match the square model in four different ways,through rotations. Gross propertiessuch asarea may be usedcheaply to reject matches suchas thesquare with the axle.

Ch. 11

Matching

Fig. 11.16 An object forrecognition byrelationalmatching.

1. RegionProperties Shape1Shape6:thefirstsixrootmeansquareamplitudesoftheFouriercom ponentsofthe<f>(s)curve [Chapter8]. 2. RelationsbetweenRegionsAandB Bigger:Area(A)/Area(B) Adjacency:FractionofA'sboundarywhichalsoisaboundaryofB. Distance:Distance betweencentroidsdividedbythegeometricmeanofaver ageradii.Theaverage radiusistwicetheareaoverthe perimeter. Distanceis scale,rotation,translation,reflection invariant. Compactness:4*TT*area/perimeter2 Above, Beside:Vertical and horizontal distance between centroids, normal izedbyaverageradius.Notrotation invariant. ThemodelthatmightbederivedforthecupofFig.11.16isshowninFig.11.17. The program workson objects such asspectacles, pen, cup,orball. During training, viewsandtheiridentifications aregiventotheprogram,andtheprogram forms arelational structure with information about the mean and variance ofthe values of the relations and properties. After training, the program is presented with ascenecontaining oneof the learned objects. Arelational structure isbuilt describing the scene; the problem isthen to match this input description witha referencedescriptionfrom thesetofmodels. Oneapproximation to the goodness ofamatch is the number ofsuccesses provided byaregion correspondence. Aoneregion object description has7rela tionstocheck, atworegion object has28,athreeregion onehas63. Therefore, the "successes" criterion could imply the choice of a terrible threeregion in terpretationoveraperfectoneregionmatch.Thesolutionadaptedinthematching evaluation isfirsttogradefailures. Afailureweightisassignedtoatrialmatchac cording to how many standard deviations a from the model mean the relevant
Sec. 11.4 Matching in Practice 3 7 7

Fig. 11.17 Relational model for cupssuchasthatofFig. 11.16.

parameter is.Fromzerotothreea implyasuccess,orafailure weightof0;from threetosixo,afailureweightof1;from sixtoninea, failureweightof2,andso on.Then the measure "trialscumulative failure weight" isan improvement on just "successes."Ontheotherhand,simpleobjectsareoften found assubpartsof complexones,andonedoesnotwanttorejectagoodinterpretation asacomplex object infavor of aless explanatory one asasimple object. Thefinalevaluation functionadaptedis CostofMatch ' ~ (triesfailureweight) ( $ ) numberofrelations , K numberofregionsinviewdescription Asin Eq. (11.4), the first term measures the average badness of matches between properties (unary relations) and relations between regions.The second term is inversely proportional to the number of regions that are matched, effectively increasingthecostofmatchesthatexplainlessoftheinput.
11.4.4 A Complex Matcher

AprogramtomatchlinearstructureslikethoseofFig.11.18isdescribedin[Davis 1976].Thismatcherpresentsquiteadiversityofmatchingtechniques incorporated intoonedomaindependent program.


378 Ch.11 Matching

The matching metric isvery close to the general metric ofEg. (11.3).The matchischaracterizedbyastructuralmatchofreferenceandinputelementsanda geometrical transformation (found by parameter fitting) which accounts for the spatial relations between reference and input. Davis forms an association graph betweenreference andinputdata.Thisgraphisreducedbyparalleliterativerelax ation (seeSection 12.4)usingthe"springfunctions" todeterminewhichnodeas sociations are too costly. Eliminating one nodenode match may render others

Sec. 11.4 Matching in Practice

3 7 9

Baffin Island

Baffin Island

Cape Breton

Cape Breton

Fig. 11.18 (a) Referenceand(b) input dataforacomplexshapematching program. (b)

more unlikely, sothe nodepruning process iterates until no more nodesare elim inated. What remains is something like an /connected component of the graph, which specifies anapproximate match supported bysomeamount ofconsistent re lations between nodes. After the processofconstraint relaxation, there arestill ingeneral severallo cally consistent interpretations for each component of the input structure. Next, therefore, a tree search is used to establish global consistency and therefore the best match.The treesearch isthefamiliar "best first" heuristicsearch through the partial match space, with pruning taking place between each stage of search again byusing theparalleliterativerelaxation technique. EXERCISES 11.1 Relational structures Aand Bare to be matched by the associationgraph, clique findingmethod.
380 Ch. 77 Matching

RelationalstructureA:entitiesu, v,w,x,y,z. relationsP{u), Piw), Piy), Riv), Rix), R iz), Fiu, v),Fiv, w),F(w, x),Fix, y), Fiy, z),F(z, u) RelationalstructureB:entitiesa,b,c,d,e, f. relationsP(a), Pib), Pid), Qie), Qif), R ic) Fib, c), Fie, d), Fid, e),Fie, / ) ,Fif, a). (a) Construct graph structures corresponding to the structures A and B.Label thenodesandarcs. (b) ConstructtheassociationgraphofstructuresAandB. (c) Visuallyfindthe largest maximal cliques in the association graph and thus thebestmatchesbetweenAandB.(Therearethree.) 11.2 Suppose in ageometric match that twoinput points on thexyplaneare identified withtwoothers taken tocorrespond withtworeference points.Itisknown that the inputdatacomesabout onlythrough rotationand translation ofthereference data. Given the two input points ix\, y\) and ix2, yi) and the two reference points ix'\, y'\) and ix'2,y'2),onewaytofindthetransformation from reference toinputis tosolvetheequation Z bcj iax'i+ by)+ c)] 2+ [y, (&c,+ ay)+d)]2= 0
>=i

Theresultingvaluesofa,b,c,anddrepresent thedesired transformation. Solvethe equation analytically toget expressions for a, b,c,and din termsofthe reference and inputcoordinates.What happensifthereference and inputdataarenotrelated bysimplerotationandtranslation? 11.3 Whataretheadvantagesanddisadvantagesofauniform method (suchassubgraph isomorphism algorithm approach) tomatchingascompared toanadhoc (suchasa decisiontreeapproachwithvariousempiricallyderivedmetrics) one? 11.4 Intheworstcase,forgraphsofnnodes,howmanypartialsolutionstotalwillAlgo rithm 11.1havetoproceed through?Construct "worst case"graphsJand Y(label theirnodes I,. .. ,n, ofcourse),assumingthatnodesof Kareselectedinascending orderatanystage. 11.5 Find outsomething about thestateofassociative memories incomputers.Howdo they work? Howarethey used? Would anything like this technology be useful for computer vision?Introspectaboutfamiliar phenomena ofvisualrecall,recognition, and memory. Doyouhaveatheory abouthowhuman visualmemorycouldpossi blywork? 11.6 Whatgraph ofA''nodes hasthemaximum number ofmaximalcliques? Howmany doesithave? 11.7 Think about reasoning byanalogy andfindoutsomething about programs that do analogical reasoning.Inwhatsense cananalogicalprocessbeused for computervi sion,andtechnicallydothematchingtechniquesnecessaryprovideanyinsight? 11.8 Compare Nevada's structure matching with Hinton's relaxationbased puppet recognition (Chapter12). 11.9 Verify the observation made in Section 11.4.3 about the number of relations that mustbecheckedbetween regions (oneregion,7;tworegions,28;threeregions,63; etc.).
Exercises 3 8 1

R E F E R E N C E S AHO, A. V., J.E.HOPCROFT and J.D. ULLMAN. The Designand Analysis of Algorithms. Reading, MA: AddisonWesley, 1974.
AMBLER, A. P., H. G. BARROW, C. M. BROWN, R. M. BURSTALL, and R. J. POPPLESTONE. " A versatile

computercontrolled assembly system." ArtificialIntelligence6,2, 1975, 129156. BARROW, H.G.and R.J. POPPLESTONE. "Relational descriptions inpictureprocessing." In MI6, 1971.
BARROW, H. G., J. M. TENENBAUM, R. C. BOLLES, and H. C. WOLF. "Parametric correspondence and

chamfer matching: two new techniques forimage matching." Proc, DARPA IUWorkshop, May 1978,2127. BERGE,C. GraphsandHypergraphs2nd rev.ed..New York:American Elsevier, 1976. BERZTISS, A.T." Abacktrack procedurefor isomorphism ofdirected graphs."/.ACM 20, 3,July 1973, 365377. BITTNER, J.R. and E. M. REINGOLD. "Backtrack programming techniques." Comm. ACM 18, 11,No vember 1975,651656. BRON, C.and J. KERBOSCH. "Algorithm 457: finding all cliques inanundirected graph (H)." Comm. ACM 16, 9,September 1973,575577. CORNEIL, D.G.and C.C.GOTLIEB. "An efficient algorithm for graph isomorphism." J.ACM17,1, January 1970,5164. DAVIS, L.S."Shape matching usingrelaxation techniques." 1EEEPAMI1, 1,January 1979,6072. FISCHLER, M. A. and R.A.ELSCHLAGER. " T h e representation and matching ofpictorial structures." IEEE Trans. Computers22, 1,January 1973,6792. HARALICK, R. M.and G. L. ELLIOTT. "Increasing treesearch efficiency for constraint satisfaction prob lems." Proc, 6th IJCAI,August 1979,356364. HARARY, F. GraphTheory.Reading, MA:AddisonWesley, 1969. KNODEL, W. "Bestimmung aller maximalen vollstandigen Teilgraphen eines Graphen G nach Stoffers." Computing3,3, 1968,239240 (and correction in Computing4,p.75). NEVATIA, R. "Structured descriptions ofcomplex curved objects for recognition and visual memory." AIM250,Stanford AILab,October 1974. NILLSON, N.J.PrinciplesofArtificialIntelligence. Palo Alto,CA:Tioga, 1980. OSTEEN, R. E.and J.T.Tou. "A cliquedirected algorithm based on neighbourhoods ingraphs." Inter nationalJ. ComputerandInformationScience2,4,December 1973,257268. REINGOLD, E. M.,J.NIEVERGELT, and N. DEO. CombinatorialAlgorithm TheoryandPractice.Englewood CIiffs,N. J.:PrenticeHall, 1977. SCHUDY, R.B.and D. H. BALLARD. "Modeldirected detection of cardiac chambers inultrasound im ages."TR12,Computer Science Dept., Univ.Rochester, November 1978. SHAPIRO, L.G.and R.M. HARALICK. "Structural descriptions and inexact matching." Technical Re port CS79011R, Computer ScienceDept., Virginia Polytechnic Institute, November 1979. ULLMAN, J. R." A n algorithm for asubgraph isomorphism." J. ACM23, 1,January 1976,3142. WINSTON, P.H."Learning structural descriptions from examples." In PCV, 1975.

382

Ch. 11 Matching

Inference
ClassicalandExtended Inference

12

This chapter explores inference, the process of deducing facts from other knownfacts. Inference isuseful for beliefmaintenanceand isacornerstone ofra tional thought. We start withpredicate logic, and then explore extendedinference systemsproduction systems, relaxation labeling, and active knowledge (pro cedures). Predicatelogic (Section 12.1) isasystem for expressing propositionsand for derivingconsequences offacts. Ithasevolved overcenturies,and manyclearac counts describe predicate logicin its various forms [Mendelson 1964; Robinson 1965].It hasgood formal properties, anontrivial but automatable inference pro cedure,andahistoryofstudyinartificial intelligence.Thereareseveral "classical" extensions (modal logics, higherorder logics) which are studied in wellsettled academicdisciplinesofmetamathematicsandphilosophy.Extendedinference(Sec tion 12.2)ispossibleinautomatedsystems,andisinterestingtechnicallyandfrom animplementational standpoint. Aproductionsystem (Section 12.3)isageneralrewritingsystemconsistingof asetofrewritingrules(A BCcouldmean"rewriteAasBC') andanexecutive program to apply rewrites. More generally, the rules can be considered "situationaction" pairs("in situationA,do.fiand C").Thusproduction systems canbeusedtocontrolcomputationalactivities.Productionsystems,likesemantic nets,embodypowerful notionsthatcanbeusedforextended inference. Labeling schemes(Section 12.4) are unlike most inference mechanisms in thattheyoften involvemathematicaloptimizationincontinuousspacesandcanbe implementedwithparallelcomputation.Labelingislikeinference becauseitestab lishesconsistent "probabilitylike"valuesfor "hypotheses"abouttheinterpreta tionofentities.
383

Activeknowledge (Section 12.5) isan implementation of inference inwhich eachchunk ofknowledgeisaprogram.Thistechniquegoesfar inthedirectionof "proceduralizing"theimplementation ofpropositions.Thedesignissuesforsuch a system include the vocabulary ofsystem primitives and their actions, mechan ismsfor implementing theflowofcontrol, andoverallcontrol oftheactionofthe system.

12.1 FIRSTORDERPREDICATECALCULUS

Predicate logicisinmanywaysanattractiveknowledgerepresentation and infer encesystem.However,despiteitshistoricalstature,important technicalresultsin automated inference, and much research on inference techniques, logichas not dominatedallaspectsofmechanized inference. Somereasonsfor thisarepresent edinSections12.1.6and 12.2.Thelogicalsystemthathasreceivedthemoststudy isfirstorderpredicate logic. General theorem provers inthiscalculusare cumber somefor reasonswhichweshallexplore.Furthermore, thereissomecontroversy as to whether this logical system isadequate to express the reasoning processes usedbyhuman beings [Hayes 1977;Collins 1978;Winograd 1978;McCarthyand Hayes 1969]. We briefly describe some aspects of this controversy in Section 12.1.6.Ourmainpurposeistogivetheflavor ofpredicatecalculusbased methods bydescribing briefly how automated inference can proceed with the formulae of predicatecalculusexpressed intheconvenient clauseform. Clauseform isappeal ingfor tworeasons.First,itcanberepresentedusefully inrelationaltupleorse mantic network notation (Section 12.1.5). Second, the predicate calculus clause and inference system may be easily compared to production systems (Section 12.3).
12.1.1 ClauseFormSyntax (Informal)

Inthissectionwedescribethesyntaxofclauseform predicatecalculussentences. In the next, a more standard nonclausal syntax is described, together with a method for assigning meaningtogrammatical logicalexpressions.Next, weshow brieflyhowtoconvertfromnonclausaltoclausalsyntax. Asentenceisasetofclauses. Aclauseisanordered pairofsetsofatomicfor mulae,oratoms. Clausesarewrittenastwo(possiblynull) setsseparated byanar row,pointing from the hypothesesorconditionsoftheclausetoitsconclusion. The nullclause,whosehypothesesandconclusionarebothnull,iswritten .Forexam ple,aclausecouldappearas Al,...,All^B],...,Bm wheretheA'sandB'sareatoms.Anatomisanexpression wherePisapredicatesymbolwhich"expects./arguments,"eachofwhichmustbe avariable, constantsymbol,oraterm.Atermisanexpression
384 Ch.72 Inference

fUx,...,tk) where/ i s &functionsymbolwhich"expectskarguments,"eachofwhichmaybea term.Itisconvenienttotreatconstantsymbolsaloneasterms. Acareful (formal) treatment ofthesyntax oflogicmustdealwith technical issuessuchaskeepingconstantandtermsymbolsstraight,associatingthenumber of expected arguments with a predicate or function symbol, and assuring an infinitesupplyofsymbols. Forexample,thefollowingaresentencesoflogic. Obscured(Backface(Blockl)) Visible(Kidney) Road(x),Unpaved(x) Narrow(x)
12.1.2 NonclausalSyntaxandLogicSemantics (Informal)

NonclausalSyntax Clauseform isasimplified butlogicallyequivalent form oflogicexpressions whichareperhapsmorefamiliar.Abriefreviewofnonclausalsyntaxfollows. Theconceptsofconstantsymbols,variables,terms,andatomsarestillbasic. Aset oflogicalconnectivesprovides unaryand binary operators tocombine atoms toform wellformed ormulae (wffs). ItAand Bareatoms,thenAisawff, asis~A f ("not A") A=S>B ("A implies," or"ifAthen5"), AVB("A or"), AAB {"A and"), A <=> B{"A isequivalent to5 , " or"A ifandonlyif5"). Thus anexampleofawffis Back(Face)V(Obscured(Face)) = > ~ (Visible(Face)) Thelastconceptisthatofuniversalandexistentialquantifiers,theuseofwhich isillustratedasfollows. (Vx) (wffusing"*"asavariable). (3 thing) (wffusing"thing"asavariable). Auniversalquantifier V isinterpreted asaconjunction overalldomainele ments,andanexistential quantifier 3 asadisjunction over alldomain elements. Hence their usual interpretation as "for each element ..." and "there existsan element...." Sinceaquantified wffisalsoawff, quantifiers maybeiterated andnested.A quantifier quantifies the "dummy" variable associated withit Gcand thing inthe examples above). The wff within the scopeof a quantifier is said to have this quantified variable boundby the quantifier. Typically only wffs or clauses all of whosevariablesareboundareallowed. Semantics Howdoesoneassignmeaning togrammaticalclausesandformulae? These manticsoflogicformulae (clausesandwffsalike) dependsonaninterpretationand
Sec.12.1 First OrderPredicate Calculus 385

onthemeaningofconnectivesandquantifiers. Aninterpretation specifies thefol lowing. 1. Adomainofindividuals 2. Aparticulardomainelementisassociatedwitheachconstantsymbol 3. Afunction over the domain (mapping kindividuals toindividuals) isassoci atedwitheachfunction symbol. 4. Arelationoverthedomain (asetofordered ktuplesofindividuals) isassoci atedwitheachpredicatesymbol. The interpretation establishes a connection between the symbols in the representation andadomainofdiscourse (suchastheentitiesonemightseeinan office orchestxray).Toestablish thetruth orfalsity ofaclauseorwff, avalueof TRUEorFALSE must beassigned toeachatom.Thisisdonebycheckingin the worldofthedomaintoseeifthetermsintheatomsatisfy therelationspecifiedby thepredicateoftheatom. Ifso,theatomisTRUE;ifnot,itisFALSE.(Ofcourse, the terms,after evaluating theirassociated functions, ultimately specify individu als).Forexample,theatom GreaterThan(5,7r) istrue under the obvious interpretation and false with domain assignments such that GreaterThanmeans"Istheauthor o f 5meansthebook GoneWiththe Wind IT meansRinTinTin. After determining thetruthvaluesofatoms,wffswithconnectivesaregiven truth valuesbyusingthe truthtablesofTable 12.1,whichspecify thesemanticsof thelogicalconnectives. The relation ofthisformal semanticsofconnectiveswith the usual connectives used in language (especially "implies") isinteresting, and one must becareful when translating natural language statements into predicate calculus. Thesemanticsofclauseform expressionsisnoweasytoexplain.Asentence istheconjunctionofitsclauses.Aclause Au...,Am+Bu withvariablesx\, ,xk istobeunderstood
Table 12.1

...,Bm

A B "A AKB T T F F
386

AM B A= > B A <=> B T T T F T F T T T F F T
Ch.12 Inference

T F T F

F F T T

T F F F

V * i , ,xk,{AxF\...l\An)

= ( 5 , V . . . V 5 m ) .

Thenullclauseistobeunderstoodasacontradiction.Aclausewithnoconditions isanassertionthatatleastoneoftheconclusionsistrue.Aclausewithnullconclu sionisadenialthattheconditions (hypotheses)aretrue.


12.1.3 ConvertingNonclausalFormto Clauses

Theconversion ofnonclausal toclausalform isdonebyapplying straightforward rewritingrules,basedonlogicidentities (ultimatelythetruthtables).Thereisone trick necessary, however, to remove existential quantifiers. Skolemfunctionsare used to replace existentially quantified variables, according to the following rea soning. Considerthewff (Vx)(Q y)(Behind (y,x))). Withtheproperinterpretation,thiswffmightcorrespondtosaying"Foranyobject xweconsider, thereisanother objectywhich isbehindx " Sincethe3 iswithin thescopeoftheV, the particularymightdepend onthechoiceofx.TheSkolem function trickistoremovetheexistentialquantifieranduseafunctiontomakeex plicit thedependence onthebound universally quantified variable.The resulting wffcouldbe (Vx) (Behind(SomethingBehind(x), x)) which might berendered inEnglish:"Any objectxhasanother object behind it; furthermore, some Skolem function wechoose to call SomethingBehind deter mineswhichobjectisbehinditsargument."Thisisanotationaltrickonly;theex istenceofthenewfunction isguaranteedbytheexistentialquantification;bothno tationsareequallyvagueastotheentitythefunction actuallyproduces. In general, one must replace each occurrence of an existentially quantified variable in a wff by a (newly created Skolem) function of all the universally quantified variables whose scope includes the existential quantifier being elim inated. Ifthere isnouniversalquantifier, the resultisanewfunction ofnoargu ments,oranewconstant. B x)(Red(x)), whichmaybeinterpreted "Somethingisred,"isrewrittenassomethinglike Red(RedThing) or "Somethingisred,andfurthermore let'scallitRedThing." The conversion from nonclausal to clausal form proceeds as follows (for moredetails,see [Nilsson 1971]).Removeallimplication signswiththe identity (A= > B) <=^> (CA)VB). UseDeMorgan'slaws(suchas~{AVB) <==> ((" A)f\(~ B)), andtheextension toquantifiers, togetherwithcancellationofdouble negations,toforcenegationstorefer onlytosinglepredicateletters.Rewritevari
72.7 First Order Predicate Calculus

387

ables togive each quantifier its own unique dummy variable. Use Skolem func tionstoremoveexistentialquantifiers.Variablesareallnowuniversally quantified, soeliminate the quantifier symbols (which remain implicitly),and rearrange the expression intoconjunctive normalform (aconjunction ofdisjunctions.) TheA's nowconnect disjunctive clauses (at last!). Eliminate theA's, obtaining from the originalexpressionpossiblyseveralclauses. Atthispoint,theoriginalexpressionhasyieldedmultipledisjunctiveclauses. Clausesinthisform maybe useddirectly inautomatic theorem provers [Nilsson 1971].Thedisjunctive clausesarenot quiteintheclauseform asdefined earlier, however;togetclausesintothefinalform,convertthemintoimplications.Group negatedatoms,reexpandingthescopeofnegationtoincludethemallandconvert ingtheVo P ' s intoa*ofA's.Reintroduceoneimplicationtogofrom 5i V B2...\fBm\J to A} A ...NAHBi\lBi...\lBm To obtain thefinalform, replace the connectives (which remain implicitly) with commas.
12.1.4 TheoremProving

{{AiAA2... AA))

Good accounts of the basic issues of automated theorem proving are given in [Nilsson 1971;Kowalski 1979;Loveland 1978]. Thebasicideasareasfollows.A sentence isinconsistent, or unsatisfiable, ifitisfalse inevery interpretation. Some triviallyinconsistentsentencesarethosecontainingthenullclause,orsimplecon tradictions such as the same clause being both unconditionally asserted and denied.Asentencethatistrueinallinterpretations isvalid. Validityofindividual clausesmaybechecked byapplyingthetruth tablesunlessquantifiers arepresent, in whichcasean infinite number offormulae are being specified, and the truth status of such aclause is not algorithmically decidable. Thus it issaid thatfirst orderpredicatecalculusisundecidable.Moreaccurately,itissemidecidable,because any valid wffcan beestablished assuch in some (generally unpredictable) finite time.Thevalidationprocedurewillrunforeveroninvalidformulae; therubisthat onecanneverbesurewhetheritisrunninguselessly,orabouttoterminateinthe nextinstant. The notion ofaproofis bound upwith the notion of logical entailment. A clause Clogicallyfollowsfrom asetofclausesS (wetakeStoproveC)ifeveryin terpretation that makesStruealsomakes Ctrue.Aformal proof isasequenceof inferences which establishes that Clogicallyfollows from S.Innonclausal predi catelogic,inferencesarerewritingsofaxiomsandpreviouslyestablished formulae inaccordancewithrulesofinferencesuchas ModusPonens:From (A)and (A= > B)infer (B) ModusTollens:From (~B)and (A= > B)infer (A) Substitution:e.g.From (V*)(ConvexGc))infer (Convex(Region31)) Syllogisms, andsoforth.
388
Ch. 12 Inference

Automatic clausal theorem provers usually try to establish that aclause C logicallyfollows from the set ofclauses S. This isaccomplished byshowing the unsatisfiabilityofSand (C)takentogether.Thisratherbackwardapproachisatech nical effect of the way that theorem provers usually work, which is to derivea contradiction. Thefundamental andsurprisingresultthatalltruetheoremsareprovablein finitetime,andanalgorithmic(butinefficient) waytofindtheproof,isduetoHer brand [Herbrand 1930].Thecruxoftheresultisthatalthoughthedomainofindi viduals who might participate in an interpretation may be infinite, only a finite numberofinterpretationsneedbeinvestigatedtoestablishunsatisfiability ofaset ofclauses,and ineachonlyafinitenumber ofindividualsmust beconsidered.A computationally efficient way to perform automatic inference wasdiscovered by Robinson [Robinson 1965].Init,asingleruleofinferencecalledresolutionisused. This single rule preserves the completenessof the system (all true theorems are provable)anditscorrectness(nofalsetheoremsareprovable). Theruleofresolution isverysimple.Resolution involvesmatchingacondi tion of one clauseA with aconclusion ofanother clause B.The derived clause, calledtheresolvent,consistsoftheunmatchedconditionsandconclusionsofAand B instantiated by the matching substitution. Matchingtwo atoms amounts to finding asubstitution of terms for variables which ifapplied to the atomswould makethemidentical. Theorem proving now means resolving clauseswith the hope ofproducing theemptyclause,acontradiction. Asanexample,asimpleresolutionproofgoesasfollows. Sayitisdesiredto prove that aparticular wastebasket isinvisible.We know that thewastebasket is behindBrian'sdeskandthatanythingbehindsomethingelseisinvisible (wehave a simpleminded view of the world in this little example). The givens are the wastebasketlocationandournaivebeliefaboutvisibility: Behind(WasteBasket, DeskOf(Brian)) Behind(object,obscurer) Invisible(object) (12.1) (12.2)

Here Behind and Invisible are predicates, DeskOf is a function, Brian and WasteBasketareconstants (denote particularspecificobjects),andobjectandob scurer are (universally quantified) variables. The negation of the conclusion we wishtoproveis Invisible(WasteBasket) (12.3) or, "Asserting thewastebasket isinvisible iscontradictory." Ourtask istoshow this set of clauses is inconsistent, so that the invisibility of the wastebasket is proved.Theresolution rule consistsofmatchingclausesonoppositesidesof the arrowwhichcanbeunified byasubstitutionoftermsforvariables.Asubstitution thatworksis: SubstituteWasteBasketforobjectandDeskOf Brian)forobscurerin(12.2). ( Thenacancellationcanoccurbetweentherightsideof (12.1) andtheleft sideof (12.2). Another cancellation canthen occur between the rightsideof (12.2) and
72.7 First Order Predicate Calculus

389

the left side of (12.3), deriving the empty clause (a contradiction), Quod Erat Demonstrandum. Anyone whohasevertried todoanontriviallogicproofknowsthatthereis searching involved in finding which inference to apply to make the proof ter minate. Usually human beings have an idea of "what they are trying to prove," and canoccasionallycall upon somedomain semantics toguidewhich inferences makesense.Notice that at notimeinaresolution proof orother formal proofof logicisaspecificinterpretation singledout;theproofisaboutallpossibleinterpre tations. If deductions are made by appealing to intuitive, domaindependent, semantic considerations (instead of purely syntactic rewritings), the deduction system isinformal. Almostallofmathematicsisinformal bythisdefinition, since normalproofsarenotpurerewritings. Manynonsemanticheuristicsarealsopossibletoguidesearch,suchastrying toreducethedifferences betweenthecurrent formulae andthegoalformula tobe proved. People use such heuristics, as does the Logic Theorist, an early non clausal,nonresolutiontheoremprover [Newelletal.1963]. Abasicresolution theorem prover isguaranteed toterminatewithaproofif oneexists,butusuallyresourcelimitationssuchastimeormemoryplaceanupper limit on the amount of effort one can afford to let the prover spend. As all the resolvents areadded tothe set ofclausesfrom which further conclusions maybe derived, the question ofselecting which clauses to resolve becomes quite avital one. Much research inautomatic theorem proving has been devoted to reducing thesearchspaceofderivationsforproofs [Nilsson 1980;Loveland 1970].Thishas usually been done through heuristics based on formal aspects ofthe deductions (such as:makedeductions that willnot increase drastically the number ofactive clauses).Guidancefrom domaindependent knowledgeisnotonlyhard toimple ment,itisdirectlyagainstthespiritofresolutiontheoremproving,whichattempts todoalltheworkwithauniform inference mechanism workingon uninterpreted symbolstrings.Amoderation ofthisviewallowsthe "intent"ofaclausetoguide itsapplicationintheproof.Thiscanresultinsubstantialsavingsofeffort; anexam ple is the treatment of "frame axioms" recommended by Kowalski (Section 13.1.4). Ad hoc, nonformalizable, domaindependent methods are not usually welcome in automatic theoremproving circles; however, such heuristics only guidetheactivityofaformalsystem;theydonotrenderitinformal.
12.1.5 PredicateCalculusandSemantic Networks

Predicatecalculus theorem proving may beassisted bytheaddition ofmorerela tionalstructure tothe set ofclauses.Thestructure inasemantic net comes from linkswhichconnect nodes;nodesareaccessedbyfollowinglinks,sotheavailability ofinformation innodesisdetermined bythelinkstructure.Linkscanthushelpby providingquick accesstorelevant information, given thatoneis"at" aparticular node. Although thereareseveral waysofrepresenting predicatecalculus formulae innetworks,weadoptherethatof[Kowalski 1979;DeliyanniandKowalski1979]. Thestepsaresimple:
390
Ch. 72 Inference

1. Useapartitiontorepresenttheclause. 2. Convertallatomstobinarypredicateatoms. 3. Distinguishbetweenconditionsandconclusions. RecallthatinChapter 10,apartitionisdefinedasasetofnodesandarcsinagraph. Theinternalstructureofthepartitioncannotbedetermined from outsideit.Parti tioning extends the structure of a semantic net enough to allow unambiguous representationsofalloffirstorderpredicatecalculus. Thefirststepindeveloping thenetwork representation for clausesistocon verteachrelation toabinaryone.Wedistinguish betweenconditionsandconclu sionsbyusinganadditional bitofinformation for eacharc.Diagrammatically, an arcisdrawnwithadoublelineifitisacondition andasinglelineifitisaconclu sion. Thus the earlier example S {(12.1),(12.2),(12.3)} can be transformed intothenetworkshowninFig. 12.1. Thisfigurehintsattheadvantagesofthenetworkembeddingforclauses:Itis anindexingscheme.Thisschemedoesnotindicatewhichclausestoresolvenext butcanhelpreducethepossibilitiesenormously.Ifthemostrecent resolutionin volvedagivenclausewithagivensetofterms,otherclauseswhichalsohavethose termswillberepresentedbyexplicitarcsnearbyinthenetwork (thiswould notbe trueiftheclauseswererepresentedasaset).Similarly,otherclausesinvolvingthe same predicate symbols arealso nearby being indexed by those symbols. Again, this would not be true in the set representation. Thus the embedded network

Fig. 12.1 Convertingclausestonetworks.


72.7 First Order Predicate Calculus

391

representation contains argument indicesand predicate indiceswhichcan beex tremelyhelpful intheinferenceprocess. Averysimpleexample illustrates theforegoing points. Suppose thatScon sistsofthesetofclauses SouthOf(river2,x),NorthOf(riverl,*)Between(river1,river2,x) (12.4) SouthOf(w,silo30) (12.5) NorthOf (riverl,sik>30) (12.6) Clause (12.5) might arise when it isdetermined that "silo30" issouth of some feature intheimagewhoseidentity isnotknown.Bottomupinferencederivesnew assertionsfromoldones. Thusintheexampleabovethevariablesubstitutions u = river2 x = silo30 matchassertion (12.5)withthegeneralclause(12.4)andallowtheinference NorthOf(riverl,silo30) Between(riverl,river2,silo30) Consequently,use(12.6)and (12.7)toassert Between(river1,river2,silo30) Supposethatthiswasnotthecase:thatis,that Between(riverl,river2,silo30) * (12.9) andthatS = {(12.4), (12.9)}.Onecouldthenusetopdowninference,which infers newdenialsfrom oldones.Inthiscase NorthOf riverl,silo30),SouthOf(river2,silo30) ( (12.10) follows with the variablesubstitution x = silo30.This can beinterpreted asfol lows:"IfAisreallysilo30,thenitisneithernorthofriverl orsouthofriver2."Fig ure12.2showstwoexamplesusingthenetwork notation. Now suppose the goal is to prove that (12.8) logically follows from (12.4) through (12.6)and thesubstitutions.Thestrategywould betonegate (12.8),add it to the data base, and show that the empty clausecan be derived. Negating an assertion producesadenial,inthiscase (12.9),andnowthesetofaxioms (includ ing the denial) consists of {(12.4),(12.5),(12.6),(12.9)}. It iseasy torepeat the earlierstepstothepointwherethesetofclausesincludes (12.8)and (12.9),which resolvetoproducetheemptyclause.Hencethetheoremisproved.
12.1.6 PredicateCalculusAndKnowledge Representation

(12.7) (12.8)

Pure predicatecalculus hasstrengths and weaknessesasaknowledge representa tion system. Some of the seeming weaknesses can be overcome by technical "tricks." Someare not inherent in the representation but are aproperty of the common interpreters used onit (i.e.,onstateoftheart theorem provers).Some problemsarerelativelybasic,andthemajority opinionseemstobethatfirstorder
392
Ch. 72 Inference

*~ Between

River2

River1

Silo30

River?

(b) Fig. 12.2 Resolution using networks, (a) Bottomup inference as a result of substitu tions u = river2, x = silo30. (b) Topdown inference as a result of substitutions w= \\ x = silo30.

predicatelogicmustbeextended inorder tobecomearepresentation schemethat issatisfactorily matched tothepowerofthedeductivemethodsapplied byhuman beings. Opinion isdividedonthetechnicalaspectsofsuchenhancements. Predi catecalculushasseveralstrengths,someofwhichwelistbelow. 1. Predicate logic is a wellpolished gem, having been refined and studied for several generations. It was designed to represent knowledge and inference. Oneknowswhat it means.Its modeltheory and proof theory areexplicitand lucid [Hayes1977;1980].
Sec. 72.7 First Order Predicate Calculus

393

2. Predicate logiccanbeconsidered alanguagewithamachineindependent se mantics;the meaning ofthe language isdetermined bythelawsoflogic,not theactualprogrammingsystemuponwhichthelogicis"executed." 3. Predicatecalculusclauseswithonlyoneconclusion atom (Hornclauses) may beconsidered as"procedures,"withthesingleconclusion beingthenameof the procedure and the conditions being the procedure body, which itself is made up of procedure calls.This view of logic leads to the development of predicate logicbased programming languages (such asPROLOG [Warren et al. 1977; McDermott 1980]). These programs exhibit nondeterminism in several interesting ways; the order of computations is not specified by the proofprocedure (and isnotrestricted byit,either).Severalresolutionsarein general possiblefor anyclause; the combinations determine many computa tionsandseveraldistinguishableformsofnondeterminism [Kowalski1974]. 4. Predicate logic may be interpreted as a problemreduction system. Then a (Horn)clauseoftheform
B

representsasolvedproblem.Oneoftheform AU...,A* withvariablesx\,... ,xk isagoalstatement, orcommand,whichistofindthe x's that solve the problems Ai, ... ,An. Finding the x's solves the goal.A clause Au ...,An>B isasolutionmethod,whichreducesthesolutionofBtoacombinationofsolu tionsofA's. Thisinterpretation ofHorn clausesmapscleanlyintoastandard andorgoaltreeformulation ofproblemsolving. 5. Resolutionsmaybeperformed ontheleft orrightofclauses,andtheresulting derivation treescorrespond,intheproblemsolvinginterpretation ofpredicate calculus,totopdownandbottomupversionsofproblemsolving.Thisduality isveryimportantinconceptualizingaspectsofproblemsolving. 6. There isauniform proof procedure for logicwhich isguaranteed toprovein finitetimeany true theorem (logicissemidecidable and complete). No false theoremsareprovable (logiciscorrect).Theseandothergoodformal proper ties are important when establishing formally the properties of aknowledge representationsystem. Predicatecalculusisnotafavorite ofeveryone, however:someofthe (per ceived) disadvantages are given below, together with waysthey might be coun tered. 1.Sometimes the axioms necessary to implement relatively common con cepts are not immediately obvious. A standard example is "equality." These largelytechnicalproblemsareannoyingbutnotbasic. 2.The "first order" infirstorder predicate calculusmeansthat the system
394
Ch. 12 Inference

doesnotallowclauseswithvariablesrangingoveraninfinitenumberofpredicates, functions, assertionsandsentences (e.g.,"All unaryfunctions areboring"cannot bestateddirectly). Thisproblem maybeameliorated byanotational trick;thesi tuationsunderwhichpredicatesaretrueareindicatedwithaHoldspredicate.Thus instead of writing On(blockl, surface, situationl), write Holds (On(block1,sur face),situationl).Thisnotationallowsinferences about manysituationswithonly oneaddedaxiom.The"situational calculus"reappearsinSection 12.3.1. Another useful notationaltrickisaDiffrelation,whichholdsbetween twotermsiftheyare syntactically different. There are infinitely many axioms asserting that terms are different; the actualsystem can bemadeto incorporate them implicitly inawell definedway.TheDiffrelationisalsousedinSection 12.3.1. 3.Theframeproblem (socalled for historical reasons and not related tothe frames described in Section 10.3.1) is a classic bugbear of problemsolving methodsincludingpredicatelogic.Oneaspectofthisproblem isthatfor technical reasons, it must beexplicitly stated inaxioms that describe actions (in ageneral senseavisualtestisanaction) thatalmostallassertionsweretrueinaworldstate remain true in the newworld state after the action isperformed. The addition of these new axioms causesahugeincrease in the "bureaucratic overhead" neces sarytomaintainthestateoftheworld. Currently,noreallysatisfactorywayofhan dlingthisproblemhasbeendevised.Themostcommonwaytoattackthisaspectof theframe problemistouseexplicit "add lists"and "delete lists" ([Fikes1977], Chapter 13)whichattempt tospecify exactlywhatchangeswhenanactionoccurs. Newtrueassertionsareaddedandthosethatarefalseafter anactionmustbedelet ed. This device isuseful, but examples demonstrating itsinadequacy arereadily constructed.MoreaspectsoftheframeproblemaregiveninChapter13. 4. There are several sorts of reasoning performed by human beings that predicate logic does not pretend to address. It does not include the ability to describe its own formulae (a form of "quotation"), the notion ofdefaults, ora mechanism for plausible reasoning. Extensions to predicate logic,such asmodal logic,areclassicallymotivated. Morerecently,workonextensionsaddressing the topicsabovehavebegun toreceiveattention [McCarthy 1978;Reiter 1978;Hayes 1977].Thereisstillactivedebateastowhethersuchextensionscancapturemany important aspectsofhuman reasoningand knowledge within the modeltheoretic system.Thecontraryviewisthatinsomereasoning, theveryprocessofreasoning itselfisanimportant partofthesemanticsoftherepresentation.Examplesofsuch extended inferencesystemsappearintheremainderofthischapter,andtheissues areaddressedinmoredetailinthenextsection.

12.2 COMPUTER REASONING

Artificial intelligence in general and computer vision in particular must be con cerned with efficiency and plausibility in inference [Winograd 1978]. Computer based knowledge representations and their accompanying inference processes often sacrificeclassicalformal propertiesforgainsincontroloftheinference proc essandforflexibilityinthesortsof"truth"whichmaybeinferred.
Sec. 12.2 Computer Reasoning

395

doesnotallowclauseswithvariablesrangingoveraninfinitenumberofpredicates, functions, assertionsandsentences (e.g.,"All unaryfunctions areboring"cannot bestateddirectly). Thisproblem maybeameliorated byanotational trick;thesi tuationsunderwhichpredicatesaretrueareindicatedwithaHoldspredicate.Thus instead of writing On(blockl, surface, situationl), write Holds (On(block1,sur face),situationl).Thisnotationallowsinferences about manysituationswithonly oneaddedaxiom.The"situational calculus"reappearsinSection 12.3.1. Another useful notationaltrickisaDiffrelation,whichholdsbetween twotermsiftheyare syntactically different. There are infinitely many axioms asserting that terms are different; the actualsystem can bemadeto incorporate them implicitly inawell definedway.TheDiffrelationisalsousedinSection 12.3.1. 3.Theframeproblem (socalled for historical reasons and not related tothe frames described in Section 10.3.1) is a classic bugbear of problemsolving methodsincludingpredicatelogic.Oneaspectofthisproblem isthatfor technical reasons, it must beexplicitly stated inaxioms that describe actions (in ageneral senseavisualtestisanaction) thatalmostallassertionsweretrueinaworldstate remain true in the newworld state after the action isperformed. The addition of these new axioms causesahugeincrease in the "bureaucratic overhead" neces sarytomaintainthestateoftheworld. Currently,noreallysatisfactorywayofhan dlingthisproblemhasbeendevised.Themostcommonwaytoattackthisaspectof theframe problemistouseexplicit "add lists"and "delete lists" ([Fikes1977], Chapter 13)whichattempt tospecify exactlywhatchangeswhenanactionoccurs. Newtrueassertionsareaddedandthosethatarefalseafter anactionmustbedelet ed. This device isuseful, but examples demonstrating itsinadequacy arereadily constructed.MoreaspectsoftheframeproblemaregiveninChapter13. 4. There are several sorts of reasoning performed by human beings that predicate logic does not pretend to address. It does not include the ability to describe its own formulae (a form of "quotation"), the notion ofdefaults, ora mechanism for plausible reasoning. Extensions to predicate logic,such asmodal logic,areclassicallymotivated. Morerecently,workonextensionsaddressing the topicsabovehavebegun toreceiveattention [McCarthy 1978;Reiter 1978;Hayes 1977].Thereisstillactivedebateastowhethersuchextensionscancapturemany important aspectsofhuman reasoningand knowledge within the modeltheoretic system.Thecontraryviewisthatinsomereasoning, theveryprocessofreasoning itselfisanimportant partofthesemanticsoftherepresentation.Examplesofsuch extended inferencesystemsappearintheremainderofthischapter,andtheissues areaddressedinmoredetailinthenextsection.

12.2 COMPUTER REASONING

Artificial intelligence in general and computer vision in particular must be con cerned with efficiency and plausibility in inference [Winograd 1978]. Computer based knowledge representations and their accompanying inference processes often sacrificeclassicalformal propertiesforgainsincontroloftheinference proc essandforflexibilityinthesortsof"truth"whichmaybeinferred.
Sec. 12.2 Computer Reasoning

395

Automated inference systems usually have inference methods that achieve efficiency through implementational, computationbased, inference criteria. For example,truth maybedefined asasuccessful lookup inadata base,falsity asthe failure tofindaproof withagiven allocation ofcomputational resources,and the establishmentoftruthmaydependontheorderinwhichdeductionsaremade. The semantics ofcomputer knowledge representations isintimately related to the inference process that acts on them. Therefore, it is possible to define knowledge representations and interpreters in computers whoseproperties differ fairly radically from those ofclassical representations and proof procedures,such asthefirstorderpredicate calculus.Forinstance,although thesystemsaredeter ministic,theymaynotbeformallyconsistent (loosely,theymaycontaincontradic tory information). They may not be complete (they cannot derive all true theoremsfrom thegivens);itmaybepossibletoprovePfrom Qbut~PfromQand R.Thesetofprovabletheoremsmaynotberecursivelyenumerable [Reiter1978]. Efforts are being made to account for the "extended inference" needed by artificial intelligence using more or less classical logic [McCarthy 1978; Reiter 1978;Hayes 1977;1978a;1978b;Kowalski 1974,1979]. Ineachcase,theclassical viewoflogicdemands that thedeductive processand thededucible truths bein dependent.Ontheotherhand, itisreasonabletodevoteattention todevelopinga nonclassical semantics of these inference processes; this topic isin the research stageatthiswriting. Several knowledge representations and inference methods using them are "classical" in the artificial intelligence world; that is, they provide paradigmatic methods of dealing with the issues of computational inference. They include STRIPS [Fikesand Nilsson 1971],the situational calculus [McCarthy and Hayes 1969], PLANNER and CONNIVER [Hewitt 1972; Sussman and McDermott 1972],andsemanticnetrepresentations [Hendrix 1979;Brachman1979]. To illustrate the issue ofconsistency, and to illustrate how varioussortsof propositionscanberepresented insemanticnets,weaddressthequestionofhow theorderofinferencecanaffect thesetofprovabletheoremsinasystem. Consider the semantic net of Fig. 12.3.The idea is that in the absence of specific information to the contrary, oneshould assume that railroad bridgesare narrow. There are exceptions, however, such asBridge02 (which has ahighway bridgeabovetherail bridge,say).Thenetwork isclearlyinconsistent, but trouble isavoidedifinferencesaremade"from specifictogeneral."Suchorderingimplies thatthesystemisincomplete,butinthiscaseincompletenessisanadvantage. Simpleordering constraints arepossibleonlywithsimpleinferential powers inthesystem [Winograd 1978].Further, there isasyet littleformal theoryonthe effects oforderingrulesoncomputationalinference,althoughthishasbeenanac tivetopic [Reiter1978].

12.3 PRODUCTION SYSTEMS

Thelastsectionexploredwhytheprocessofinference itselfcouldbeanimportant partofthesemanticsofaknowledgerepresentation system.Thisideaisanimpor


396
Ch. 12 Inference

Fig. 12.3 Aninconsistentnetwork.

tant part ofproduction systems. Perceived limitations inlogicinference mechan ismsand theseductive powerofarbitrary algorithmic processes for inference has spawned thedevelopment ofrulebasedsystemswhichdiffer fromfirstorderlogic inthefollowingrespects: Arbitraryadditionsanddeletionstotheclausaldatabaseareallowed. An interpreter thatcontrolsthe inference processinspecialwaysisusuallyan integralpartofthesystem. Early examples ofsystems with thefirstaddition areSTRIPS [Fikesand Nilsson 1971] and PLANNER [Hewitt 1972].Later examples ofsystemswith bothaddi tionsaregivenin [WatermanandHayesRoth 1978].Thevirtuesoftryingtocon trolinferences maybeappreciatedafter ourbriefintroductiontoclausalautomatic theorem proving,wheretherearenoverygoodsemanticheuristicstoguide infer ences. However, the pricepaid for restricting the inference process isthe lossof formal properties of consistency and correctness of the system, which are not guaranteedinrulebasedsystems.Weshalllookinsomedetailataparticular form ofrulebasedinferencesystemcalledproductionsystems. Aproductionsystemsupportsageneralsortof"inference." Ithasincommon withresolution that matching isneeded toidentify which inference tomake.Itis different in that theaction uponfindingamatching dataitem isless constrained. Actionsofarbitrarycomplexityareallowed.Aproductionsystemconsistsofanex plicitsetofsituationaction nodes,whichcanbeappliedagainstadatabaseofsit uations.Forexample,inaveryconstrainedvisualdomaintherule (Green (RegionX))(Grass(Region^) (12.11) could infer directly the interpretation ofagiven region. Segmentation rules can alsobedeveloped;thefollowingexamplemergestwoadjacent greenregionsintoa singleregion.
Production Systems

397

(Green(Region X))A(Green(Region y))A (Adjacent(Region^), (Region Y)) * (Green(RegionZ))A((Region Z):= (Union(RegionX,Region y))) These examples highlight severalpoints.Thefirstisthat basicideaofproduction systems issimple.The rulesareeasy to "read" byboth the programmer and his program and new rules are easily added. Although it is imaginable that "situa tions"couldextenddowntothepixellevel,andproductionsystemscouldbeused (for instance) tofindlines, thesystem overhead would render such an approach impractical. In the visualdomain, the production system usually operates on the segmented image (Chapters4and5)orwiththehighlevel internal model.In the rulesabove,Xand Yarevariablesthatmust bebound tospecific instancesofre gionsinadatabase.Thisprocessofbindingvariablesormatchingcanbecomevery complex,andisoneofthetwocentralissuesofthiskindofinference.Theotheris howtochooserulesfrom asetallofwhosesituationsmatch thecurrent situation tosomedegree.

12.3.1 ProductionSystem Details

Initssimplestformaproductionsystemhasthreebasiccomponents: 1. Adatabase 2. Asetofrules 3. Aninterpreterfortherules The vision data base isusually aset offacts that areknown about the visual en vironment. Often therulesareconsidered tobethemselvesamanipulablepartof thedatabase.Examplesofsomevisualfactsmaybe (ABOVE(Region5) (Region10)) (SIZE(Region5)300) (SKY(Region5)) (TOP(Region5)255) The data base isthe solestorage medium for allstate variables ofthe system.In particular, unlike procedurally oriented languages, there is no provision for separatestorageofcontrolstateinformationno separateprogramcounter, push downstack,andsoon[DavisandKing1975]. Arule is an ordered pair ofpatternswith alefthand sideand a righthand side. Apattern may involve onlydata baseprimitives but usually willhave vari ablesand specialforms assubpatternswhicharematched againstthedata baseby the interpreter. Forexample, applyingthe following ruletoadata basewhichin cludes(12.12),
398
Ch. 12 Inference

(12.12)

(TOP (RegionX) (GreaterThan200)) (SKY(RegionX)) region 5can beinferred to besky.The lefthand side matches aset of database factsandthiscauses (SKY (Region5)) tobeaddedtothedatabase.Thisexample showsthekindsofmatchingthattheinterpreter mustdo:(1)theprimitiveTOPin the data basefact matches the samesymbol intherule, (2) (RegionX) matched (Region5)andJ i s boundto5asasideeffect, and (3) (GreaterThan200)matches 255.Naturally,theusermustdesign hisowninterpreter torecognizethemeaning ofsuchoperationalsubpatterns. However, even the form ofthe rulesoutlined sofar isrelatively restrictive. There isnoreasonwhythe righthandsidecannot doalmostarbitrary things.For instance, theapplication ofarulemayresultinvariousproductionsbeingdeleted oradded from theset ofproductions; thedata baseofproductionsand assertions thuscanbeadaptive [Waterman andHayesRoth 1978].Also,therighthand side may specify programs to be run which can result in facts being asserted into the databaseoractions performed. Control in abasic production system isrelatively simple:Rules are applied until somecondition inthedata baseisreached. Rulesmaybeapplied intwodis tinctways:( D a matchonthelefthand sideofarulemayresultintheadditionof the consequences on the righthand side tothe data base, or (2) amatch on the righthand side mayresult intheadditionoftheantecedentsinthelefthand side tothedatabase.Theorderofapplicationofrulesinthefirstcaseistermedforward chainingreasoning, where the objective isto see ifaset ofconsequences can be derived from agiven set of initial facts. The second case is known as backward chaining, theobjectiveistodetermineasetoffactsthatcouldhaveproducedapar ticularconsequence.
12.3.2 Pattern Matching

(12.13)

Intheprocessofmatchingrulesagainstthedatabase,severalproblemsoccur: Manyrulesituationsmaymatchdatabasefacts Rulesdesignedforaspecificcontextmaynotbeappropriateforlargercontext Thepatternmatchingprocessmaybecomeveryexpensive Thedatabaseorsetofrulesmaybecomeunmanageablylarge. Theproblemofmultiplematchesisimportant.Earlysystemssimplyresolveditby scanning thedata baseinalinearfashion andchoosing thefirstmatch, but thisis an ineffective strategy for large data bases, and has conceptual problems aswell. Accordingly, strategies have evolved for dealing with these conflicts. Like most inferencecontrolling heuristics, their effectiveness can be domaindependent, theycanintroduceincompletenessintothesystem,andsoon. Ontheprincipleofleastcommitment,whentherearemanychancesoferrors, onestrategyistoapplythemostgeneralrule,defined bysomemetriconthecom
Sec. 12.3 Production Systems

399

ponentsofthepattern. Onesimplesuchmetricisthenumberofelementsinapat tern. Antithetical tothisstrategy isthe heuristic ofapplying the mostspecificpat tern. This may beappropriate where the likelihood ofmaking afalse inference is small, and where specific actions may be indicated (match (MAD DOG) with (MAD DOG), not with (DOG)). Another popular but inelegant technique isto exercise control over the orderofproduction application by using state markers which are inserted into the data base byrighthand sides and looked for by left handsides. 1. ABA <marker 1> . 2. A+B[\ <marker2>. 3. Bi\ <marker 1> C. 4. B/\ <marker2> *D. Hereifrule1 isexecuted, "controlgoestorule3 , "i.e.,rule3isnowexecut able,whereasifrule2isapplied, "control goestorule4."Similarly,such control paradigms assubroutining, iteration and coroutining may be implemented with productionsytems [Rychner1978]. Theuseofconnectivesandspecialsymbolscanmakematchingbecomearbi trarilycomplex.Rulesmight beinterpreted asallowingallpartialmatchesintheir antecedentclauses[BajcsyandJoshi 1978].Thus (A B C) (D) isinterpretedas (ABC) V (BC) V (AB) V (AC) V (A) V (B) V (C) (D) wheretheleftmost actualmatchisusedtocomparetheruletoothersinthecaseof conflicts. The problem oflargedata basesisusually overcome bystructuring them in somewaysothat theinterpreter appliestherulesonlytoasubset ofthedatabase orusesasubset oftherules.Thisstructuring underminesabasicprincipleofpure rulebased systems:Controlshould bedependent onthecontentsofthedata base alone. Nevertheless, many systems dividethe data base into twoparts:an active smallerpartwhichfunctions liketheoriginaldatabasebutisrestrictedinsize,and a larger data base which is inaccessible to the ruleset in the active smaller part. "Metarules" have actions that move situationaction rules and facts from the smaller data base to the larger oneand viceversa.The incoming set ofrulesand factsispresumably thatwhichisapplicableinthecontextindicatedbythesituation triggering the metarule. This twolevel organization of rules is used in "black board" systems, such as Hearsay for speechunderstanding [Erman and Lesser 1975]. Themetarulesseemtocaptureourideaof"mental set,"or"context,"or "frame" (Section 10.3.1, [Minsky 1975]).The twodata basesaresometimesre ferred to asshortterm memory and long term memory, in analogy with certain modelsofhumanmemory.

400

Ch.12 Inference

12.3.3 An Example

We shall follow theactionsofaproduction systemforvision [Sloan 1977;Sloan andBajcsy 1979].Theintent hereistoavoidadescriptionofallthedetails (which maybefound intheReferences) andconcentrateontheperformance ofthesys temasreflected byasampleofitsoutput.Theprogram usesaproduction system architecture inthedomain of outdoor scenes. Thegoal is to determine basic features ofthescene,particularlytheseparation betweenskyandground.Thein terpreter istermedthe"observer" andthememoryhasatwotieredstructure:(1) short term memory (STM) and(2)longterm memory (LTM),adata baseofall facts ever known orestablished, structured toprefer accesstothemost recently usedfacts.TheimagetobeanalyzedisshowninFig.12.4,andtheaction maybe followed inFig.12.5.Theanalysisstartswiththeinitializationcommand *(look100000100nil) Thiscommand directstheObservertoinvestigateallregions that fall inthesize range100to100000,indecreasingorderofsize.TheLTMisinitializedtoNIL. ourfirstlookat (region11) x 35 y rg yb 2 24 29 wb size top 6 2132 35 bottom 97 left 2 right 127

This report is produced byan imageprocessing procedure that produces assertionsabout (region 11).ThisregionisshownhighlightedinFig.12.5c. ProgressReport regionsonthisbranch: (11) contextstack:

Fig. 12.4 Outdoor scenetobeanalyzed withproduction system.

Sec. 12.3 Production Systems

401

Fig. 12.5 Imagescorresponding tostepsinproductionsystemanalysis,(a)Tex tureinthescene,(b)Region 11 outlined, (c)SkyGroundseparation, (d)Skyline.

nil contentsofshorttermmemory: ((farleft (region11)) (farright (region 11)) (right (region 11) 127) (left (region 11)2) (bottom (region 11)97) (top(region 11)35) (wb(region 11)minus) (yb(region11)zero) (rg(region11)zero) (size(region11)2132)) endofprogressreport Notethatgraylevelinformation isrepresentedasavectorinopponentcolorspace (Chapter 2), where the components axes are WHITEBLACK (wb), RED GREEN (rg),and YELLOWBLUE (yb). Threevalues (plus,zero,minus) are used for each component. The display above isgenerated onceafter every itera tion ofthe Observer. Thereport showsthat (REGION 11) isbeing investigated; thereisnoknowncontextfor thisinvestigation;theinformation about (REGION 11) created bytheimageprocessing apparatushasbeen placed inSTM.Thecon textstackisforinformation only,andshowsatraceofactivatedsetsofrules.

402

Ch.12 Inference

ithinkthat (farleft (region 11)) ithinkthat (farright (region 11)) ithinkthat (right (region 11)127) ithinkthat (left (region 11)2) ithinkthat (bottom (region 11)97) ithinkthat (top(region 11)35) ithinkthat (size(region 11)2132) Thisportion ofthetraceshowsassertions movingfrom STM toLTM.They arereportedbecausethisisthefirsttimetheyhavebeenREMEMBERed (aspecial procedureintheObserver). ProgressReport regionsonthisbranch: (11) contextstack: nil contentsofshortterm memory: ((color (region 11)black)) endofprogressreport The assertions created from the region data structure have been digested, andleadonlytotheconclusion that (REGION 11)isBLACK,basedonaproduc tionthatlookslike: (wb(regionx)minus)A(rw(regionx)zero) A(bw(regionx)zero) (color (regionx)black) ProgressReport regionsonthisbranch: (11) contextstack: nil contentsofshortterm memory: ((ground (region 11)) (shadow (region 11))) endofprogressreport TheobserverknowsthatthingsthatareblackareGROUNDandSHADOW. Thefactsitdeducesaboutregion 11areagainstoredintheLTM. Having discovered a piece of ground, the Observer has activated the GROUNDRULES, and changed context. It now investigates the neighbors of (REGION11). ourfirstlookat (region16) x y rg yb wb 58 2 23 30 3 size 1833 top 57 bottom 119 left 2 right 127

Production Systems

403

(REGION 16) isaneighbor of(REGION 11),and theobserveristryingtodeter minewhetherornottheyaresufficiently similar,inbothcolorandtexture,tojus tifymergingthem. ProgressReport regionsonthisbranch: (1611) contextstack: (ground) contentsofshorttermmemory: ((texturedifference (region 16) (region 11))) (colorsimilar (region 16) (region 11)) (distance(region 16)near) (ground (region 16)) (color (region 16black)) endofprogressreport TheObserver decides that (REGION 16) isground because it isatthe bot tomofthepicture. Thegroundgrowingprocesscontinues,untilfinallyoneoftheneighborsofa ground region isapieceofsky.TheObserver willnot immediately recognizethis region assky, but will see that adepth discontinuity exists and that the border betweenthesetworegionsrepresentsasectionofthreedimensionalskyline. ourfirstlookat(region8) x y rg yb wb size 27 2 13 13 33 394 top 15 bottom 38 left 2 right 57

ProgressReport regionsonthisbranch: (8131611) contextstack: (groundgroundground) contentsofshorttermmemory: ((newneighbor (region800) (farleft (region8)) (right (region8)57) (left (region8)2) (bottom (region8)38) (top(region8) 15) (wb(region8)zero) (yb(region8)minus) (rg(region8)minus) (size(region8)394)) endofprogressreport texturedescriptorsfor (region8)are (5450) texturedescriptorsfor (region 13)are(4451) Texturemeasurement isappropriateinthecontextofgroundareas.

404

Ch.12 Inference

ProgressReport regionsonthisbranch: (8131611) contextstack: (groundgroundground) contentsofshorttermmemory: ((texturesimilar (region8) (region 13)) (colordifference (region8) (region 13)) (color(region8)bluegreen)) endofprogressreport (REGION8)passesthetexturesimilaritytest,butfailsthecolormatch. ProgressReport regionsonthisbranch: (8131611) contextstack: (groundgroundground) contentsofshorttermmemory: ((darker (region13) (region8)) (brighter (region8) (region 13)) (yellower (region 13) (region8)) (bluer (region8) (region 13)) (redder (region13)13) (below(region 13) (region8)) (above(region8) (region 13))) endofprogressreport checkingtheborderbetween (region 13)and (region8) ProgressReport regionsonthisbranch: (8131611) contextstack: (skylinegroundgroundground) contentsofshorttermmemory: ((segmentsbuilt) (skylinesegment ((11742)) (region13) (region8)) (skylinesegment ((1440) (1340)) (region13) (region 8))) endofprogressreport ProgressReport regionsonthisbranch: (8131611) contextstack: (skylinegroundgroundground)

Sec. 12.3 Production Systems

405

contentsofshorttermmemory: ((peak (1440)) (peak (1742))) endofprogressreport Two local maxima have been discovered in the skyline. On the basis ofa depthjudgment,thesepeaksarecorrectlyidentified astreetops. The analysiscontinues until all the major regions havebeen analyzed. The skygroundseparationisshowninFig.12.5aandskylineinFig.12.5e. Inmostcases,completeanalysisoftheimagefollows from thecontext esta blished by the first (largest) region. This implies that initial scanning of such scenescanbequitecoarse,and verysimpleideasaboutgrosscontext areenough to get started. Once started, inferences about local surroundings lead the Observer'sattentionovertheentirescene,oftenreturningmanytimestothesame partoftheimage,eachtimewithabitmoreknowledge.
12.3.4 Production SystemProsandCons

Intheirpureform, theproductionsofproductionsystemsarecompletely "modu lar," and are themselves independent of the control process. The data base of facts, or situations, isunordered set accessed in undetermined order tofindone matching some rule.The rule isapplied, and the system reports the search fora matchingsituation andsituationaction pair (rule). Thiscompletely unstructured organization of knowledge could be amodel for the human learning of "facts" whichbecomeavailableforusebysomeassociativemechanism thatfindsrelevant facts inour memories.The hopefor pureproduction systemsisthat performance willdegrade noncatastrophically from the deletion of rulesor facts, and that the rulescaninteractinsynergisticandsurprisingways.Alearningcurvemaybesimu lated bythe addition ofproductions.Thus oneisencouraged toexperiment with howknowledgemaybestbebrokenupintodisjoint fragments thatinteracttopro duceintelligent behavior. Together with the modularity of productions in asimple system, there isa corresponding simplicity inthe overall control program.The purecontroller sim plylooksatthedatabaseandsomehowfindsamatchingsituation (lefthand side) amongtheproductions,appliestherule,andcycles.Thissimplestructureremains constant no matter how the rules change, so anynondeterminism in the perfor mancearisesfrom thematcher,whichmayfinddifferent left handsidematchesfor setsofassertionsinthedatabase. The productions usually have asyntax that is machinereadable. Their se mantics issimilarly constrained, and soit beginstoseem hopeful that aprogram (perhapsfiredupbyaproduction) could reason about therules themselves,add them, modify them, ordelete them. This isincontrast to the situation withpro cedurallyembeddedknowledge (Section 10.1.3),becauseitisdifficult orimpossible for programstoanswergeneralquestionsaboutother programs.Thustheclaimis that aproduction system canmoreeasilyreason about itself thancan many other knowledgerepresentationsystems.

406

Ch.12 Inference

Productions often interact in waysthat arenot foreseen. Thiscanbeanad vantageoradrawback, dependingonthe behavior desired.The patternmatching control structure allows knowledge to be used whenever it isrelevant, not only when the original designer thought that it might be. Symbiotic interaction of knowledge may also produce unforeseen insights. Production systems are apri marytoolofknowledgeengineering, anenterprise thatattempts toencodeand use expert knowledge at such tasks as medical diagnosis and interpretation of mass spectrograms [Lindsay et al. 1980; Buchanan and Mitchell 1978;Buchanan and Feigenbaum 1978;Shortliffe 1976;Aikins1980]. There are many whoarenot convinced that production systems really offer theadvantagestheyinitiallyseemto.Theyusethefollowingsortsofarguments. Thepureformofproductionsystemisalmostneverseendoinganythinguse ful.Inparticular, theproductionsystemismostnaturallyaforwardchaining infer encesystem,and onemustexerciserestraintsandguidelinesonittokeepitfrom runningawayanddeducinglotsofirrelevantfactsinsteadofdoingusefulwork.Of course,productionsystemsmaybewrittentodobackwardchainingbyhypothesiz ingaRHSand seeingwhich LHSmust betrue for thedesired RHStooccur (the process may be iterated toany depth). In practical systems based on production systems, there isimplicit or explicit ordering ofproduction rulessothe matcher triestheminsomeorder.Often theorderingisdetermined inarathercomplexand dynamic manner, withgroupsofrelated rulesbeing more likelytobeappliedto gether, the mostrecently used rulenotallowed to bereapplied immediately,and soon.Infact, manyproductionsystems'scontrollershaveallthecontrolstructure tricksmentionedabove(andmore) builtintothem;thesimpleandelegant"bagof rules"idealisinadequateforrealisticexamples.Whentherulesareexplicitlywrit tenwithanidiosyncraticcontrolstructureinmind,thesystemcanbecomeunprin cipledandinexplicable. On the same lines, notice how difficult it is to specify a timeordered se quenceofactionsbyacompletely modularsetofrewritingrules.Itisunnaturalto force knowledge about processes that may contain iteration, tests,and recursion intotheform ofindependent situationaction rules.Aviewthatismoreeasilyde fensible isthat knowledge about procedures for perception should beencoded as (embedded in)computerprocedures,notassertionsorrules.Thecausalchainthat dictates that some actions are best performed before others is implicit in the sequential execution ofprocedures, and the language constraints, such as iterate and test, test and branch, orsubroutine invocation, are allfairly natural waysto think about solving certain problems. Production systems can in fact be madeto perform alltheseprocedurallike functions, butonlythrough anabrogation ofthe idealofmodular, unordered, matchingoriented ruleinvocation which isthepro ductionsystemideal.Thequestionturnsintooneofaesthetics;howtouseproduc tionsinagoodstyle,andtoworkwiththeirphilosophyinsteadofagainstit. Tosummarizetheprevioustwoobjections:Productionbasedknowledgesys tems may in practice be no more robust, easily modified, modular, extensible, understandable,orselfunderstanding thananyother (say,procedural) systemun less great care is taken. After acertain level of complexity is reached, they are

72.3 Production Systems

407

likely to be as opaque as any other scheme because of the controlstructuring methodsthatmustbeimposedonthepureproductionsystemform.


12.4 SCENELABELINGANDCONSTRAINT RELAXATION

The general computational problem of assigning labelsconsistently to objects is sometimes called the "labeling problem," and arises in many contexts, such as graphandautomatahomomorphism, graphcoloring,Latinsquaregeneration,and ofcourse,imageunderstanding [DavisandRosenfeld 1976;Zucker 1976;Haralick and Shapiro 1979]. "Relaxation labeling," "constraint satisfaction," and "cooperative algorithms" arenatural implementations for labeling,and theirpo tentialparallelismhasbeenaveryinfluential development incomputervision.As shouldanyimportantdevelopment, therelaxation paradigmhashadanimpacton theconceptualizationaswellasontheimplementation ofprocesses. Cooperating algorithmstosolvethe labelingproblemareuseful inlowlevel vision (e.g., line finding, stereopsis) and inintermediatelevel vision (e.g., line labeling, semanticsbased region growing). They may also be useful for the highestlevel visionprograms,thosethatmaintainaconsistentsetofbeliefsabout theworldtoguidethevisionprocess. Section 12.4.1 presents the main concepts in the labeling problem. Section 12.4.2outlinessomebasicformsthat"discretelabeling"algorithmscantake.Sec tion 12.4.3introducesacontinuing example, thatoflabeling linesinalinedraw ing,andgivesamathematicallywellbehavedprobabilistic"linearoperator"label ingmethod. Section 12.4.4modifies thelinearoperator tobemoreinaccordwith our intuitions,andSection 12.4.5describes relaxation aslinear programmingand optimization,therebygainingadditionalmathematicalrigor.
12.4.1 ConsistentandOptimal Labelings

Alllabelingproblemshavethefollowingnotions. 1. Aset of objects. In vision, the objects usually correspond toentities tobela beled,orassigneda"meaning." 2. Afinitesetofrelationsbetweenobjects.Thesearethesortsofrelationswesaw in Chapter 10; in vision, they are often geometric or topological relations between segments in a segmented image. Properties of objects are simply unaryrelations.Aninputsceneisthusarelationalstructure. 3. Afiniteset of labels, orsymbolsassociated with the "meanings" mentioned above.Inthesimplestcase,eachobjectistobeassignedasinglelabel.Alabel ingassignsoneormorelabelsto (asubsetof) theobjectsinarelational struc ture. Labelsmay beweighted with "probabilities"; a (label,weight) paircan indicatesomethinglikethe"probabilityofanobjecthavingthatlabel." 4. Constraints, which determine what labels may be assigned to an object and whatsetsoflabelsmaybeassignedtoobjectsinarelationalstructure.

408

Ch. 12

Inference

Abasiclabelingproblemisthen:Givenafiniteinputscene(relationalstruc ture ofobjects), aset oflabels,and aset ofconstraints,finda"consistent label ing."Thatis,assignlabelstoobjectswithoutviolatingtheconstraints.Wesawthis problem in Chapter 11,where itappeared asamatching problem. Hereweshall startwiththediscretelabelingofChapter 11andproceed tomoregenerallabeling schemes. Asasimpleexample,considertheindoorsceneofFig.12.6. Thesegmented office sceneisto haveitsregions labeled asDoor, Wall,Ceiling, Floor, andBin, with the obvious interpretation ofthe labels.Herearesome possible constraints, informally stated. Note that theseparticular constraints are in terms ofthe input relational structure, not the world from which the structure arose.Amore com plex (but reasonable) situation arises ifscene constraints must bederived from rules about the three dimensional domain ofthe scene and the imaging process. Unaryconstraintsuseobject propertiestoconstrain labels;naryconstraints force setsoflabelassignmentstobecompatible. Unaryconstraints 1. TheCeilingisthesinglehighestregionintheimage. 2. TheFloormustbecheckered.
DBFWC DBFWC DBFWC DB FW C 1<QDBFWC^ t DB FW

J^y^

^thm
C

T (b)

w
D

y w^m
^
(c)

Fig. 12.6 A stylized "segmented office scene."The regionsare the objects to be assigned labels D, B, F, W, C (Door, Bin, Floor, Wall, Ceiling). In (a), each ob ject isassigned all labels. In (b) unaryconstraints have been applied (see text). In (c), relational constraints have been applied, and a unique label for each region results.

Sec.12.4 Scene Labeling and Constraint Relaxation

409

Naryconstraints 3. AWallisadjacent totheFloorandCeiling. 4. ADoorisadjacenttotheFloorandaWall. 5. ABinisadjacent toaFloor. 6. ABinissmallerthanaDoor. Obviously,therearemanyconstraintsontheappearanceofsegmentsinsuch ascene;whichonestousedependsontheavailablesensors,theeaseofcomputa tionoftherelationsandtheir powerinconstrainingthelabeling.Heretheapplica tionoftheconstraints (Fig. 12.6) resultsinauniquelabeling. Although thecon straintsofthisexamplearepurelyforillustration, asystemthatactually performs suchlabelingonrealofficescenesisdescribedin[BarrowandTenenbaum1976]. Labelings maybecharacterizedasinconsistentorconsistent. Aweaker notion isthatofan optimallabeling. Each oftheseadjectives reflects aformalizable pro pertyofthelabelingofarelationalstructureandthesetofconstraints. Ifthecon straints admit of only completely compatible or absolutely incompatible labels, thenalabelingisconsistentifandonlyifallitslabelsaremutuallycompatible,and inconsistentotherwise.OneexampleisthelinelabelsofSection9.5;linedrawings that could not beconsistently labeled weredeclared "impossible." Suchablack andwhiteviewofthesceneinterpretation problemisconvenientandneat,butitis sometimes unrealistic. Recall that oneof theproblems with the linelabelingap proach ofChapter 9isthatitdoesnotcopegracefully withmissinglines;strictly, missing linesoften mean "impossible" line drawings.Such an uncompromising stancecanbemodified byintroducingconstraintsthatallowmoredegreesofcom patibilitythantwo(whollycompatibleorstrictlyincompatible).Oncethisisdone, both consistent and inconsistent labelings may be ranked on compatibility and likelihood.Itispossiblethataformally inconsistentlabelingmayrankbetterthana consistentbutunlikelylabeling. Some examples are shown in Fig. 12.7.In 12.7b,the "inconsistent" labels arenotnonsensical,butcanonlyarisefrom (averyunlikely)accidentalalignment ofconvex edgeswiththreeofthesixverticesofahexagonal holeinanoccluding surface.Theverticesthatarisearenotallincludedinthetraditionalcatalogoflegal vertices,hence the "inconsistent" labeling.The "floating cube"interpretation is consistent, butthe"sittingcube"interpretation maybemorelikelyifsupportand gravityareimportantconceptsinthesystem.InFig.12.7c,thescenewithamissing linecannot beconsistent according tothe traditional vertexcatalog, but the "in consistent" labelsshown are still the most likely ones.Labelings areonly "con sistent," "inconsistent," or"optimal"withrespect toagivenrelational structure ofobjects (aninputscene) andasetofconstraints.These examplesaremeant to beillustrativeonly.
12.4.2 DiscreteLabelingAlgorithms

Let us consider the problem offindingaconsistent set of labels, taken from a discrete finite set. This problem may be placed in an abstract algebraic context [HaralickandKartus 1978;Haralick 1978;Haralicketal.1978].Perhapsthesim
410
Ch. 12 Inference

Scene

Consistent labels

Optimal labeling

v^ /^v
^A
Road \v ,. \ Grass \ \ ! h a \ \ \ \ ^ \ \ \ \ o / \ w Trees
\ i ( \

A A V A

ft
+
(b)

ft
+

f]ft
+

(c)

Fig. 12.7 Three scenes (A, B, C) and their labelings. Labelings are only "consistent," "inconsistent," or "optimal" with respect to a given relational structure of objects (an input scene) and a set of constraints. These examples are meant to be illustrative only.

plest waytofindaconsistent labeling ofarelational structure (weshalloften say "labeling ofascene") istoapplyadepthfirst treesearchofthelabelingpossibili ties,asinthebacktrackingalgorithm (11.1). Labelanobjectinaccordancewithunaryconstraints. Iterateuntilagloballyconsistentlabelingisfound: Giventhecurrentlabeling,labelanotherobject consistentlyinaccordancewithallconstraints. Iftheobjectcannotbelabeledconsistently,backtrack andpickanewlabelforapreviouslylabeledobject. Thislabelingalgorithmcanbecomputationally inefficient. First, itdoesnot prune the search tree very effectively. Second, if it is used to generate all con sistentlabelings,itdoesnotrecognizeimportantindependencesinthelabels.That is,itdoesnot noticethatconclusionsreached (labelsassigned) inpartofthetree searchareusableinotherpartswithoutrecomputation. Inaserialrelaxation, the labelsarechanged oneobject atatime.After each such change, thenewlabeling isused todetermine which object toprocessnext. Thistechniquehasproveduseful insomeapplications [FeldmanandYakimovsky 1974]. Assignallpossiblelabelstoeachobjectinaccordancewith unaryconstraints. Iterateuntilagloballyconsistentlabelingisfound: Somehowpickanobjecttobeprocessed. Modify itslabelstobeconsistentwiththecurrent labeling. Aparalleliterativealgorithm adjusts all object labels at once; we have seen thisapproach inseveral places,notably inthe "Waltzfilteringalgorithm" ofSec tion9.5. Assignallpossiblelabelstoeachobjectinaccordancewith unaryconstraints. Iterateuntilagloballyconsistentlabelingisfound: Inparallel,eliminatefrom eachobject'slabelset thoselabelsthatareinconsistentwiththecurrent labelsoftherestoftherelationalstructure. Alessstructured version ofrelaxation occurswhen theiteration isreplaced withanasynchronousinteractionoflabeledobjects.Suchinteraction maybeimple mentedwithmultiplecooperatingprocessesorinadatabasewith"demons" (Ap
412 Ch.12 Inference

pendix2).Thismethod ofrelaxationwasusedinMSYS [BarrowandTenenbaum 1976].Hereimagine thateachobject isanactiveprocessthatknowsitsownlabel set andalsoknowsabout theconstraints, sothatitknowsaboutitsrelationswith otherobjects.Theprogramofeachobjectmightlooklikethis. IfIhavejustbeenactivated,andmylabelsetisnot consistentwiththelabelsofotherobjectsinthe relationalstructure,thenIchangemylabelsettobe consistent,elseIsuspendmyself. WheneverIchangemylabelset,Iactivateotherobjects whoselabelsetmaybeaffected, thenIsuspendmyself. Tousesuchasetofactiveobjects,onecangiveeach oneallpossiblelabels consistent withthe unary constraints,establish theconstraintssothat the objects knowwhereandwhentopassonactivity,andactivateallobjects. Constraints involvingarbitrarily manyobjects (i.e.,constraintsofarbitrarily high order) canefficiently berelaxed byrecordingacceptable labelingsinagraph structure [Freuder 1978].Eachobject tobelabeledinitiallycorrespondstoanode inthegraph,whichcontainsalllegallabelsaccordingtounaryconstraints.Higher orderconstraintsinvolvingmoreandmorenodesareincorporated successivelyas newnodesinthegraph. Ateachstepthenewnodeconstraintispropagated;thatis, thegraphischecked toseeifitisconsistentwiththenewconstraint.Withthein troductionofmoreconstraints,nodepairingsthatwerepreviouslyconsistentmay befound tobeinconsistent. Asanexampleconsider the followinggraphcoloring problem: color the graph in Fig. 12.8 so that neighboring nodes have different colors.It issolved by building constraints ofincreasingly higher order and pro pagating them. The node constraintsaregiven explicitly asshown in Fig. 12.8a, butthehigherorder constraintsaregiveninfunctional implicitform; prospective coloringsmust betested toseeiftheysatisfy theconstraints.After thenodecon straints are given, order two constraints are synthesized as follows: (1) makea node for each node pairing; (2) add all labelings that satisfy the constraint. The resultisshowninFig. 12.8b.Thesingleconstraintoforder threeissynthesizedin thesameway,butnowthegraph isinconsistent:thematch "Y,Z: Red,Green"is ruledoutbythethirdorderlegallabelset(RGY,GRY).Torestoreconsistencythe constraintispropagatedthroughnode (Y,Z) bydeletingtheinconsistentlabelings. Thismeans that the node constraint for nodeZisnow inconsistent. To remedy this, theconstraint ispropagated again bydeleting the inconsistency, in thiscase the labeling (Z:G). The change is propagated to node (X,Z) by deleting (X,Z: Red,Green)andfinallythenetworkisconsistent. In this example constraint propagation did not occur until constraints of order three wereconsidered. Normally, someconstraint propagation occurs after every ordergreater than one. Ofcourse itmaybeimpossible tofindaconsistent graph.ThisisthecasewhenthelabelsfornodeZinourexamplearechanged from (G,Y) to(G,R). Inconsistencyisthendiscoveredatorderthree. Itisquitepossiblethatadiscretelabelingalgorithmwillnotyieldauniquela belforeachobject.Inthiscase,aconsistentlabelingexistsusingeachlabelforthe
Sec. 12.4 SceneLabeling andConstraint Relaxation 413

0
(a)

!]

S]0 PI

Fig. 12.8 Coloringagraphbybuildingconstraintsofincreasinglyhigherorder.

object. However, which ofanobject's multiple labelsgoeswithwhich ofanother object's multiple labelsisnotdetermined.Thefinal enumeration ofconsistent la belingsusuallyproceedsbytreesearchoverthereducedsetofpossibilitiesremain ingafter therelaxation. Convergenceproperties of relaxation algorithms are important; convergence means that insomefinite timethelabelingwill "settle down" toafinalvalue.In discrete labeling, constraints may often be written so that the label adjustment phasealwaysreducesthenumber oflabelsfor anobject (inconsistent onesareel iminated).Inthiscasethealgorithmclearlymustconvergeinfinitetimetoacon sistentlabeling,sinceforeachobjectthelabelsetmusteithershrinkorstaystable. Inschemeswherelabelsareadded, orwherelabelshavecomplexstructure (such as real number "weights" or "probabilities"), convergence is often not guaranteed mathematically, thoughsuchschemesmaystillbequiteuseful. Some probabilistic labeling schemes (Section 12.4.3) have provably good convergence properties.

414

Ch.12 Inference

It is possible to use relaxation schemes without really considering their mathematical convergence properties, their semantics (What isthe semanticsof weights attached to labelsare they probabilities?), or aclear definition ofwhat exactly the relaxation isto achieve (What isagood setoflabels?). The fact that some schemes can be shown to have unpleasant properties (such as assigning nonzeroweightstoeachoftwoinconsistenthypotheses,ornotalwaysconverging toasolution), does notmean that they cannot beused. It only meansthat their behaviorisnotformallycharacterizableorpossiblyevenpredictable.Asrelaxation computations become morecommon, the lessformalizable, lesspredictable, and less conceptually elegant forms of relaxation computations will be replaced by betterbehaved,morethoroughlyunderstoodschemes. 12.4.3 ALinearRelaxationOperatorandaLineLabelingExample TheFormulation Wenowmoveawayfrom discretelabelingand intotherealmofcontinuous weights or supposition values on labels. In Sections 12.4.3 and 12AA we follow closelythedevelopment of [Rosenfeld etal.1976].Letusrequirethatthesumof labelweightsfor eachobject beconstrained tosum tounity.Then theweightsare reminiscent ofprobabilities, reflecting the "probability that the label is correct." Whenthelabelingalgorithmconverges,alabelemergeswithahighweightifitoc cursinaprobablelabelingofthescene.Weights,orsuppositionvalues,areinfact hardtointerpretconsistentlyasprobabilities,buttheyaresuggestiveoflikelihoods andoftencanbemanipulatedlikethem. In what follows p refers to probabilitylike weights (supposition values) ratherthantothevalueofaprobabilitydensityfunction. Letarelational structure with n objects begiven byan i= 1,..., n,each with mdiscrete labels\\, ...,X/w. Theshorthand p, (X)denotestheweight,or (withtheabovecaveats) the "proba bility"thatthelabelX (actuallykk forsomek)iscorrectfortheobjectar Thenthe probabilityaxiomsleadtothefollowingconstraints, 0 < A (X) < 1 I A00 = 1
A

(12.14) (12.15)

The labeling processstartswithaninitialassignment ofweightstoalllabels forallobjects [consistentwithEqs. (12.14)and (12.15)].Thealgorithmisparallel iterative: It transforms all weights at once into a new set conforming to Eqs. (12.14) and (12.15),andrepeatsthistransformation untiltheweightsconvergeto stablevalues. Considerthetransformation astheapplicationofanoperatortoavectorofla bel weights.This operator isbased onthe compatibilitiesoflabels,which serveas constraints in this labeling algorithm. Acompatibility py looks likeaconditional probability. Ep, (X|X')= 1 forall /,j , X' (12.16)

Sec. 12.4 Scene Labelingand Constraint Relaxation

415

Pa(X.|X') = 1

iff X = X', else0.

(12.17)

The/7,y(X|X')maybeinterpretedastheconditionalprobabilitythatobjecta,hasla belX given thatanotherobject ajhaslabelX'.Thesecompatibilities maybegath eredfromstatisticsoveradomain,ormayreflectaprioribeliefor information. The operator iteratively adjusts label weights in accordance with other weightsandthecompatibilities.Anewweight/>,(X)iscomputedfrom oldweights andcompatibilitiesasfollows. p,(\) := ZcuII Pn(X|X')/>,(X')}
J
A '

(12.18)

TheCyarecoefficients suchthat I C y= 1
j

(12.19)

InEq.(12.18),theinnersumistheexpectationthatobjecta,haslabelX,giventhe evidence provided byobject a,, p, (X) isthus aweighted sum ofthese expecta tions,andtheCyaretheweightsforthesum. Torun the algorithm, simply pick thepu and Cy ,and applyEq. (12.18) re peatedlytothe/?,untiltheystopchanging. Equation (12.18)isintheformofama trixmultiplication onthe vectorofweights,asshown below;thematrix elements areweightedcompatibilities,theCyPy.Therelaxationoperatoristhusamatrix;ifit ispartitioned intoseveral componentmatrices,onefor eachsetof noninteracting weights, linear algebra yields proofs of convergence properties [Rosenfeld etal. 1976].The iteration for the reduced matrix for each component does converge, andconvergestotheweightvectorthatistheeigenvectorofthematrixwitheigen valueunity.Thisfinalweightvectorisindependentoftheinitialassignmentsofla belweights;weshallsaymoreaboutthislater. AnExample Letusconsider theinputlinedrawingsceneofFig.12.9ausedin [Rosenfeld et.al. 1976].The linelabelsgiven inSection 9.5allowseveralconsistent labelsas showninFig.12.9be,eachwithadifferent physicalinterpretation. In the discrete labelling "filtering" algorithm presented in Section 9.5 and outlined intheprecedingsection,therelationalstructureisimposed bytheneigh borrelationbetweenverticesinducedbytheirsharingaline.Unaryconstraintsare imposed through acatalogoflegalcombinations oflinelabelsatvertices,and the binaryconstraintisthatalinemustnotchangeitslabelbetweenvertices.Thealgo rithmeliminatesinconsistentlabels. Letustrytolabelthesidesofthetriangleah ai, anda3inFig.12.9withthe solidobjectedgelabels{>, < , +,}.Todothisrequiressome"conditionalprob abilities"for compatibilitiespy{\ |X'),solet ususethose thatariseifalleightin terpretationsofFig.12.9areequallylikely.Rememberingthat

p{X\Y)= pifJ?
p(Y)

(12.20)

416

Ch.12 Inference

(a)

(b)

(c)

(d)

(e)

Fig. 12.9 Atriangleand itspossible labels, (a) Edge names, (b) Floating. (c) Flapfolded up. (d) Triangular hole. (e) Flapfolded down.

andtakingp(X,Y) tomeantheprobabilitythatlabelsXand Foccurconsecutively inclockwiseorderaround the triangle, onecan derive Table 12.2.Ofcourse,we couldchooseothercompatibilitiesbasedonanyconsiderationswhateveraslongas Eqs.(12.16)and(12.17)arepreserved. Table 12.2showsthattherearetwononinteractingcomponents, {,>}and {+,<}.Considerthefirstcomponentthatconsistsoftheweightvector W > ) , P\(~)>Pi(>), Pi(), P3(>), />3()l (12.21) The second istreated similarly.Thisvector describes weightsfor the subpopula tion of labelingsgiven byFig. 12.9band c. The matrix M ofcompatibilities has columnsofweightedpf>.
C n / M i ( > | > ) c 2 i / > 2 i ( > l > )

ClLPll(>|) C\2p\l(>\>)

C2\P2\(>\) C22P2l(>\>) C22P2li>\) C23/>23(>l>) C23/>23(>|)

C\2Pni>\) Cl3/>13(>l>) CnPn(>\~)

(12.22)

Sec. 12.4 SceneLabeling andConstraint Relaxation

417

Table 12.2

x,

\ 2

Pkl, *2>
>/4

/ > O i | A 2 ) % 1

> > > > > > < < < < < <

I
o8
0 0 0 0 0 0 0 0
'/4

6
0 0 0 0 0 0 0 0

+ +

+ +

> > <

+ +
Ifweletcu =]hforall/,j , then

+
<

I
o 8

t,

5
?3 /3

1 0 / 3 / ,

0 1 1 0
^ 3 / 3 1 0

1 0
h
1

/3
0

M=

1 0
%

1 1 0
/3

(12.23)

j& ^3

1 0 1 0 0 1 An analytic eigenvector calculation (Appendix 1) shows that the Mof Eq. (12.23)yields(foranyinitialweightvector) thefinalweightvectorof
P/4, '/4, 3/4, % % I/4]

(12.24)

Thuseachlineofthepopulationinthecomponentwechose (Fig.12.9bandc)has label > with"probability" 3/4,with"probability" lA.Inotherwords,fromanini tialassumption thatalllabelingsinFig.12.9bandcwereequallylikely,thesystem ofconstraintshas"relaxed" tothestatewherethe"mostlikely"labelingisthatof Fig.12.9b,thefloatingtriangle. This relaxation method is acrisp mathematical technique, but it has some drawbacks.It hasgoodconvergence properties, but itconverges toasolution en tirely determined bythecompatibilities, leaving noroom for preferences orlocal sceneevidence to beincorporated and affect thefinalweights. Further, thealgo rithm perhaps doesnot exactly mirror the following intuitions about howrelaxa tionshouldwork.

418

Ch. 12 Inference

1. Increasep,(k) ifhigh probability labelsfor other objects arecompatible with . assignmentofA toa,. 2. Decreasep,(A) ifhighprobabilitylabelsareincompatiblewiththeassignment . ofA toa,. 3. Labels with low probability, compatible or incompatible, should have little influenceon/?,(A). However, the operator ofthissectiondecreases/?,(A) the mostwhenother labels have both lowcompatibility and lowprobability.Thus itaccordswith (1) above, butnotwith(2)or(3).Someofthesedifficulties areaddressedinthenextsection.
12.4.4 ANonlinearOperator

TheFormulation Ifcompatibilities are allowed to take on both positive and negative values, then wecan express strong incompatibility better and obtain behavior more like (1), (2),and (3)justabove. Denotethecompatibilityoftheevent "labelkon a" withtheevent "labelk ona" byty(A,A.').Ifthetwoeventsoccurtogether often, r,jshould bepositive.Iftheyoccurtogether rarely, ru should benegative.Ifthey areindependent, r,yshould be0. The correlation coefficientbehaves likethis,and thecompatibilitiesofthissectionarebasedoncorrelations (hencethethenotation r,jforcompatibilities).Thecorrelationisdefinedusingthecovariance. covtf, Y)=p(X, Y)p(X)p(Y) (12.25) Nowdefineaquantitya whichislikethestandarddeviation *(X) = [p(X) (p(X))2]* thenthecorrelationisthenormalizedcovariance

This allows the formulation of an expression precisely analogous to Eq. (12.18),onlythatr,jinsteadofpyisusedtoobtainameansofcalculatingtheposi tiveornegativechangeinweights. <7,a)(A) = <v [ njik, k')p/kHk')]
J
*'

(12.27)

In Eqs. (12.27)(12.29) the superscripts indicate iteration numbers.The weight change(Eq.12.27)couldbeappliedasfollows, Pi(k+lHk)=pM(k) +qM(k) (12.28) butthentheresultantlabelweightsmightnotremainnonnegative.Fixingthisina straightforward wayyieldstheiterationequation P
u n

p;(*)(x)[l + g /* ) (A)] ( A ) a^ f ; ; H r r J ^ : ^ 7 ) a)
A

lA (X)[l + ^ (X)]

H2.29)

Sec. 12.4 Scene Labeling and Constraint Relaxation

419

The convergence properties of thisoperator seem tobeunknown, and like thelinearoperatoritcanassignnonzeroweightstomaximallyincompatiblelabel ings. However, itsbehavior can accord with intuition, asthe following example shows.
An Example

Computing the covariances and correlations for the set of labels of Fig. 12.9beyieldsTable12.3. Figure12.10showsthenonlinearoperatorofEq.(12.29)operatingontheex ampleofFig.12.9.Figure12.10showsseveralcases.
1. Equal initial weights: c o n v e r g e n c e toapriori probabilities (%,% %,%).

2. Equal weights in the component {>,}: convergence to "most probable" floatingtrianglelabeling. 3. Slightbiastowarda lapabelingisnotenoughtoovercomeconvergencetothe f l "mostprobable"labeling,asin(2). 4. Like(3),butgreaterbiaselicitsthe"improbable"labeling. 5. Contradicatory biasestoward "improbable" labelings:convergence to "most probable"labelinginstead. 6. Like(5),butstrongerbiastowardone"improbable"labelingelicitsit. 7. Biastoward oneof thecomponents {>,}, {<,+} converges to most prob ablelabelinginthatcomponent. 8. Like(7),onlybiasedtolessprobablelabellinginacomponent.
12.4.5 Relaxation asLinear Programming The Idea

Linear programming (LP) provides some useful metaphors for thinking about relaxation computations, aswell asactual algorithms and arigorous basis [HummelandZucker1980].Inthissectionwefollowthedevelopmentof[Hinton 1979].
Table12.3

Xi

X2

cov(X), X2)
7

cor(Xi,X2)
7

> > >

> > <

/64

/l5

?64 %

5A/105 5/VT05

420

Ch. 12 Inference

> < +
1

P,(

'


P3i+)

a2
3

*
(b)

Case

initial weights 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.3 0.3 0.3 0.8 0.8 0.8

After 2to3 iterations 0.3 0.3 0.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.2 0.2 0.2 0.2 0.2 0.37 0.51 0.37 0.36 0.64 0.36 0.5 0.5 0.16 0.7 0.49 0.17 0 0 0 0.2 0.2 0.2

After 20to30 iterations 0.33 0.33 0.17 0.17 0.33 0.33 0.17 0.17 0.33 0.33 0.17 0.17

Limit 0.37 0.37 0.13 0.13 0.37 0.37 0.13 0.13 0.37 0.37 0.13 0.13 1 1 1 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
0

(2)

0.5 0.5 0.5 0.5 0.4 0.5 0.5 0.3 0.5 0.3 0.3 0.5 0.2 0.3 0.5 0.3 0.3 0.3

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.2 0.2

0.5 0.5 0.5 0.5 0.6 0.5 0.5 0.7 0.5 0.7 0.7 0.5 0.8 0.7 0.5 0.3 0.3 0.3

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.2 0.2

0.98 0.98 0.98 1 0.97


1 1 0.07 1

0 0 0
0 0 0

0.2 0.2 0.2 0 0.03 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0

(3)

0.62 0.49 0.62


0.64 0.36 0.64 0.5 0.5 0.84 0.3 0.51 0.83

(4)

0 0 0 0

0 0.93 0 0.05
0.05 0 0.94 0 0

0.95

0
0 0 0 0 0

(5)

0.95 0 1 0 0.06 1 1 0 0 0

1 1 1 0 1 1 1 1 1 1 1 0

(6)

(7)

0.41 0.13 0.32 0.14 0.41 0.13 0.32 0.14 0.41 0.13 0.32 0.14 0.38 0.17 0.29 0.16 0.35 0.20 0.25 0.20 0.23 0.16 0.45 0.16

0.98 0.98 0.98


1 1 0.2

0 0 0
0 0 0

0.02 0.02 0.02


0 0 0.8

0 0 0

0 0 0 1

0 0 0 0

(8)

0.3 0.2 0.3 0.2 0.25 0.25 0.25 0.25 0.2 0.2 0.4 0.2

Fig. 12.10 The nonlinear operator produces labelings for the triangle in (a), (b) shows how the label weights are displayed, and (c) shows a number of cases (see text).

Toputrelaxationintermsoflinearprogramming,weusethefollowingtrans lations. . LABELWEIGHTVECTORS = > POINTSINEUCLIDEANNSPACE.Each possible assignment of a label to an object isa hypothesis, to which a weight (supposition value) is to be attached. With N hypotheses, an Nvector of weights describes a labeling. We shall call this vector a (hypothesis or label) weightvector.Formlabelsandnobjects,weneedatmostEuclideanwmspace. . CONSTRAINTS =$>INEQUALITIES.Constraintsaremappedintolinearine qualitiesinhypothesisweights,bywayofvariousidentitieslikethoseof "fuzzy logic" [Zadeh 1965]. Each inequality determines an infinite halfspace. The weight vectors within this halfspace satisfy the constraint. Those outside do not.Theconvexsolid thatisthesetintersection ofallthehalfspaces includes those weight vectors that satisfy all the constraints: each represents a "con sistent" labeling. In linear programming terms, each such weight vector isa feasiblesolution. Wethus havethe usualgeometric interpretation ofthe linear programmingproblem,whichistofindthebest (optimal)consistent (feasible) labeling (solution,orweight vector). Solutionsshould haveintegervalued (1 or0valued) weightsindicatingconvergencetoactuallabelings,notprobabilis ticonessuchasthoseofSection 12.4.3,ortheoneshowninFig.12.10c,case1. HYPOTHESIS PREFERENCES =^> PREFERENCE VECTOR. Often some hypotheses (labelassignments) arepreferred toothers,onthebasisofapriori knowledge, image evidence, and so on. To express this preference, make an Ndimensional preference vector,which expresses the relative importance (preference) ofthehypotheses.Then Thepreferenceofalabelingisthedotproduct ofthepreference vector andtheweightvector (itisthesumforallhypothesesoftheweightof eachhypothesistimesitspreference). Thepreference vectordefinesapreferencedirectionin/Vspace.Theop timal feasible solution isthat one "farthest" in the preference direc tion.Let xand ybefeasible solutions;theyare/Vdimensional weight vectorssatisfying allconstraints. Ifz = x yhasacomponent inthe positivepreference direction,then xisabetter solutionthany,bythe definitionofthepreferenceofalabeling. It is helpful for our intuition to let the preference direction define a "down ward"directioninNspaceasgravitydoesinourthreespace.Thenwewishto pickthelowest (mostpreferred) feasiblesolutionvector. LABELING = > OPTIMALSOLUTION.Therelaxationalgorithmmustsolve the linear programmingproblemfind the bestconsistent labeling. Under the conditionswehaveoutlined, thebestsolutionvectoroccursgenerallyataver texoftheNspacesolid.Thisissobecauseusuallyavertexwillbethe "lowest" partoftheconvexsolid inthepreference direction.Itisararecoincidencethat thesolid"restsonafaceoredge,"butwhenitdoesawholeedgeorfaceofthe solidcontainsequallypreferred solutions (thepreferencedirectionisnormalto

422

Ch.12 Inference

theedgeorface). Forinteger solutions,thesolidshould betheconvex hullof integersolutionsandnothaveanyverticesatnonintegersuppositionvalues. The "simplex algorithm" isthe best known solution method in linear pro gramming. It proceeds from vertex to vertex, seeking the one thatgives theop timalsolution. Thesimplex algorithm isnotsuited toparallel computation, how ever,soherewedescribeanotherapproachwiththeflavor ofhillclimbingoptimi zation.Basically,anysuch algorithm movestheweight vector around in//space, iteratively adjusting weights.Iftheyareadjusted oneatatime,serialrelaxationis takingplace;iftheyarealladjusted atonce,therelaxation isparalleliterative.The feasible solution solidand the preference vectordefine a"cost function" overall TVspace, which acts like a potential function in physics. The algorithm tries to reach anoptimum (minimum) valuefor thiscostfunction. Aswithmanyoptimi zationalgorithms,wecanthinkofthealgorithmastryingtosimulate (inNspace) a ball bearing (the weight vector) rolling along some path down to a point of minimum gravitational (cost) potential. Physics helps the ball bearing find the minimum;computeroptimizationtechniquesaresometimeslessreliable. TranslatingConstraintstoInequalities The supposition values,orhypothesis weights,maybeencoded intothein terval [0,1],with0meaning"false," 1 meaning"true."Theextensionofweights tothewholeintervalisreminiscentof"fuzzy logic,"inwhichtruthvaluesmaybe continuousoversomerange [Zadeh 1965].AsinSection 12.4.3,wedenotesuppo sition valuesby/?( ; //, A, B, and Care labelassignment events,which maybe ) considered ashypotheses that the labelsarecorrectly assigned. ~,V,A, =^ > and < = > aretheusuallogicalconnectivesrelatinghypotheses.Theconnectivesallow theexpression ofcomplexconstraints.Forinstance,aconstraint mightbe"Label xas ' / if a n d onlyifzislabeled V or qislabelled V . " This constraint relates three hypotheses: h\. (xis" / ' ) ,h2 (zis "w>"),hy.(qis "v"). Theconstraint is then/?, <=^> (h2V h3). Inequalitiesmaybederivedfrom constraintsthisway. 1. Negation.p(H) = 1 p{~(H)). 2. Disjunction. The sums of weights ofthe disjunct aregreater than orequal to one. p(A\J B\l.. .V C)givestheinequalityp{A) + p{B) + . . . + p(C) > 1. 3. Conjunction. These aresimply separate inequalities, oneperconjunct. In par ticular,aconjunction ofdisjunctions maybedealtwithconjunct by conjunct, producingonedisjunctiveinequalityperconjunct. 4. Arbitrary expressions. These must be put into conjunctive normal form (Chapter 10)byrewritingallconnectivesasA'sandV s .Then (3)applies. Asanexample,consider thesimplecaseoftwohypothesesAandB,withthe singleconstraint thatA = > B.Applyingrules 1 through 4resultsinthe following fiveinequalitiesinp(A) andp(B); thefirstfour assureweightsin [0,1].The fifth arisesfrom thelogicalconstraint,sinceA=>Bis thesameasB\/ '(A).

SceneLabeling andConstraint Relaxation

423

0<p(A) p(A) < 1 O^p(B) p(B) 1 p(B) + (lp(A))

>l

or

p(B)>p(A)

These inequalities are shown in Fig. 12.11. Asexpected from the = > con straint,optimalfeasiblesolutionsexistat:(1,1)or (A,B)\ (0,1)or (~(A),B); (0,0) or (~(A), ~(B)). Whichoftheseispreferred dependsonthepreference vector.If both itscomponentsarepositive, (A,B) ispreferred. Ifbotharenegative, ("(A), ~(B)) ispreferred,andsoon. ASolutionMethod Herewedescribe(inprose)asearchalgorithmthatcanfindtheoptimalfeasi blesolution tothe linear programming problem asdescribed above.Thedescrip tion makes use of the mechanical analogy of an TVdimensional solid of feasible solutions,orientedinTVspacesothatthepreferencevectorinducesa"downward" direction in space. The algorithm attempts to move the vector of hypothesis weightstothepointinspacerepresentingthefeasiblesolutionofmaximum prefer ence.Itshouldbeclearthatthisisapointonthesurfaceofthesolid,andunlessthe preference vector is normal to aface or edge of the solid, the point isa unique "lowest"vertex. Toestablish apotential that leadstofeasiblesolutions,oneneedsameasure oftheinfeasibilityofaweightvectorforeachconstraint.Define theamountavec torviolatesaconstraint tobezeroifitisonthefeasible sideoftheconstraint hy perplane.Otherwisetheviolationisthenormaldistanceofthevectortothehyper plane.Ifh, isthecoefficient vector ofthe ith hyperplane (Appendix 1)andwthe weightvector,thisdistanceis d,=wh, (12.30)

B
i,o) f
1

'

p(Q)>p(P) ' ! ! ^

^ \ ^ >X^\N>^:

p(Q)>0

p{P)>0

Fig. 12.11 Thefeasible regionfor two hypotheses Aand Band theconstraintA B.Optimalsolutionsmayoccurat the three vertices.Thepreferred vertexwill bethatonefarthest inthedirectionof the preference vector,orlowestifthe preference vectordefines "down."
Ch. 12 Inference

4 2 4

Ifwethendefinetheinfeasibilityas / = I ~
/ 2

(12.31)

then dl/ddj = d,isthe ratethe infeasibility changes for changesin theviolation. The force exerted byeach constraint isproportional to the normal distance from theweightvectortothefeasibleregiondenned bythatconstraint,andtendstopull theweightvectorontothesurfaceofthesolid. Nowaddaweak "gravitylike"force inthepreference direction tomakethe weight vector drift to the optimal vertex. At this point an optimization program mightperformasshowninFig.12.12. Figure 12.12illustratesaproblem:Theforces ofpreference and constraints willusuallydictateaminimum potentialoutsidethesolid (inthepreferencedirec tion).Fixesmustbeappliedtoforcetheweightvectorbacktotheclosest (presum ablytheoptimum)vertex.Onemightroundhighweightsto1 andlowonesto0,or addanotherlocalforcetodrawvectorstowardvertices. Examples An algorithm based onthe principlesoutlined inthepreceedingsectionwas successfully used tolabelscenesof "puppets" suchasFig. 12.13with bodyparts [Hinton1979]. The discrete,consistencyoriented version oflinelabeling maybeextended toincorporatethenotionofoptimallandings. Suchasystemcancopewiththeex plosiveincreaseinconsistentlabelingsthatoccursifvertexlabelsareincluded for casesofmissinglines,accidentalalignment,or"twodimensional"objectssuchas folded paper. It allows modeling of the fact that human beings donot "see"all possibleinterpretationsofsceneswithaccidentalalignments.Iflabelingsaregiven

(a)

(b)

Fig. 12.12 In (a),theweight vector movesfrom StorestatT,under thecom bined influence of the preferences and the violated constraints. In (b), conver gence isspeeded by makingstronger preferences, but theequilibrium is farther awayfrom theoptimalvertex.
Sec. 72.4 Scene Labeling and Constraint Relaxation 425

bestset; BOTTRUNK NECK Bl UPPERARMD2F2THIGH 13 K2 BOTNECK HEAD Cl TRUNK Al BOTHEAD NECK 81 TOPUPPERARM TRUNKAl LOWERARM E4 TOPLOWERARM UPPERARMD2 HAND TOPUPPERARM TRUNKAl LOWERARM G2 TOPLOWERARM UPPERARMF2 HAND H2 TOPHAND LOWERARM G2 TOPTHIGH TRUNKAl CALF J4 BOTCALF THIGH13 FOOT BOTTHIGH TRUNKAl CALF L4 BOTCALF THIGHK2 FOOT

!trytointerpret [trunk as upright importance^]; !trytointerpret [thigh as upright importances]; !.bestset; A2 TOP TRUNK NECK UPPERARM12KlTHIGH BOTNECK HEAD Cl TRUNK Bl BOTHEAD NECK TOPTHIGH TRUNK A2 CALF E3 TOPCALF THIGH D3 FOOT TOPTHIGH TRUNK A2 CALF E3 TOPCALF THIGH F3 FOOT HI TOP FOOT CALFG3 TOP UPPERARM TRUNKA2 LOWERARM J3 BOTLOWERARM UPPERARM12 HAND
BOT UPPERARM BOT LOWERARM TRUNK A2 LOWERARM L 3 UPPERARM K l HAND

D3 F3

!.bestset; Al TOP HEAD NECK Bl TOPNECK HEADAl TRUNK C2 TOPTRUNK NECKBl UPPERARMHIJlTHIGH TOPTHIGH TRUNKC2 CALF E3 TOPCALF THIGHD3 FOOT TOPTHIGH TRUNKC2 CALF G3 TOPCALF THIGHF3 FOOT TOP UPPERARM TRUNKC2 LOWERARMII TOPLOWERARM UPPERARMHI HAND
TOP LOWERARM TRUNK C2 LOWERARM K4 BOT LOWERARM UPPERARM J l HAND L 6 80T HAND LOWERARM K4

D3 F3

Fig. 12.13 Puppet scenes interpreted by linear programming relaxation, (a) showsanupsidedownpuppet, (b)isthesameinputalongwithpreferences toin terpretthetrunkand thighsasupright;theseresultinaninterpretationwithtrunk and necknotconnected.In (c),theprogramfindsonlythe"best" puppet,sinceit wasonlyexpectingone. 426
Ch. 12 Inference

costs, then onecaninclude labelsformissing linesandaccidental alignmentas highcost labels,renderingthem usablebutundesirable.Also,inasceneanalysis systemusingrealdata,localevidenceforedgeappearancecanenhancetheapriori likelihood thatalineshould bearaparticular label.Ifsuchpreferences canbeex tracted along with thelines ina scene, theevidence canbe used bytheline labelingalgorithm. The inconsistency constraints forline labels maybeformalized asfollows. Eachlineandvertexhasonelabelinaconsistentlabeling;thusforeachlineLand vertex/, I
alllinelabels

p(L haslabelLLABEL) = 1 pU haslabelVLABEL) = 1

(12.32) (12.33)

I
allvertexlabels

Ofcourse,theVLABELSandLLABELSintheabove constraints mustbe forcedtobecompatible(ifLhasLLABEL,JLABELmustagreewithit).Foraline Landavertex/ a titsend, p(L hasLLABEL) = I p(J haslabelVLABEL) (12.34)

allVLABELS givingLLABELloi.

Thisconstraint alsoenforces thecoherence rule (aline maynotchangeitslabel betwenvertices). Using these constraints, linear programming relaxation labeled thetriangle exampleofFig.12.7asshowninFig.12.14,whichshowsthreecases. 1. Preference 0.5foreachofthe threejunction labelassignments (hypotheses) corresponding tothefloating triangle,0preference forallotherjunction and linelabelhypotheses:convergestofloatingtriangle. 2. Like (1),butwith equal preferences given tothejunction labelsforthetri angularholeinterpretation,0toallotherpreferences. 3. Preference3totheconvexedgelabelfora2overridesthethreepreferencesof 1/2forthefloatingtriangleofcase(1).Allpreferencesbutthesefourwere0. SomeExtensions The translation of constraints to inequalities described above doesnot guarantee thattheyproduceasetofhalfspaces whoseintersection istheconvex hullofthefeasible integersolutions.Theycanproduce "noninteger optima,"for which supposition values arenotforced to 1or0.This isreminiscent ofthe behaviorofthelinearrelaxation operatorofSection 12.4.3,andmaynotbeobjec tionable.Ifitis,someeffort mustbeexpended tocopewithit.Hereisanexample
Sec. 72.4 Scene Labeling andConstraint Relaxation 4 2 7

> <

P(a,=

.
(b)

P(a3 =+)

Case

After 10 iterations 0.65 0.65 0.65 0.22 0.22 0.22 0.89 0.89 0.89 0.48 0.34 0.48 0.01 0.01 0.01 0 0 0 0 0 0 0.14 0.14 0.14 0 0 0 0.05 0.99 0.05 0.90 0.90 0.90 0.14 0.14 0.14 0.81 0 0.81

After 20 iterations 0.07 0.07 0.07 0.95 0.95 0.95 0.23 0.15 0.23 (c) 0 0 0 0 0 0 0 0 0 0.04 0.04 0.04 0 0 0 0 0.99 0 0.99

After 30to40 iterations 0 0 0 0.99 0.99 0.99 0 0 0 0 0 0 0 0 A / \ / < \

0.99
0.99 0 0 0

(2)

0.39 0.39 0.39 0.56 0 0.56

l I A
0
0 0 0

o
0 0.99 0

/ > \

(3)

0.99 0 0.99

0 0 0

Fig. 12.14 As in Fig. 12.10, the triangle of (a) is to be assigned labels, and the changing label weights are shown for three cases in (c) using the format of (b). Supposition values for junction labels were used as well, but are not shown. All initial supposition values were 0.

oftheproblem. Assumethreelogicalconstraints, ~(AAB),~(BA C),and~(CA A). SupposeA,B,and Chaveequalpreferences ofunity (thepreference vectoris (1,1,1)).Translatingtheconstraintsyields

p(A) +p(B) < 1 p(B) +p(C) < 1 p{C) +p(A) < 1


Thebestfeasiblesolutionhasatotalpreferenceof1V2,andis p(A) =p(B) =p(C) = >/2

(12.35)

(12.36)

Herethe "best" solution isoutsidetheconvex hulloftheintegersolutions (Fig. 12.15). Thebasicwaytoensure integer solutionsistousestrongerconstraints than thosearisingfrom thesimplerulesgivenabove. Thesemaybeintroducedatfirst, orwhen somenoninteger optimum hasbeen reached.These stronger constraints are called cuttingplanes,since they cut off the noninteger optimavertices. In the exampleabove,theobviousstrongerconstraintis

pU) +p(B) +p(C) < 1


4 2 8 Ch. 12

(12.37)
Inference

P(B) =0 p(A)

piA)

p(A)+p{B)<\

p(A) +p(B) +p(C) < 1

P(C)

PW)=0
p{B)+p(C)<1

P(B)
(a)

P(B)

(b)

Fig. 12.15 (a) shows partof thesurface ofthe feasible solid with constraints > (A&B), <(B & C), >(C&/4), and the noninteger vertex where the three halfspaces intersect. (b)showsacuttingplanecorrespondingtotheconstraint "at mostoneofA, B,or C " that removesthenonintegervertex.

which says that at most one ofA, B, and Cis true (thisisalogical consequence of the logical constraints). Such cutting planes can be derived asneeded, and can be guaranteed to eliminate all noninteger optimal vertices in afinite number of cuts [Gomory 1968; Garfinkel and Nemhauser 1972]. Equality constraints may be introduced astwo inequality constraintsintheobviousway:Thiswillconstrain the feasible region toaplane. Suppose that one desires "weak rules," which areusually true butwhich can be broken if evidence demands it? For each constraint arising from such a rule, add a hypothesis to represent the situation where the rule is broken. This hypothesis is given a negative preference depending on the strength of the rule, and the constraint enhanced to include the possibility of the broken rule. For example, if a weak rule gives the constraint P V Q, create a hypothesis H equivalent to ~{P\J Q) = (~(P) A '((?)), and replace the constraint with P V Q\J H. Then by "paying the cost" of the negative preference for //, we can have nei therPnor Qtrue. Hypotheses can be created as the algorithm proceeds by having demonlike "generator hypotheses." The demon watches thesupposition value ofthe genera tor, and when it becomes high enough, runs a program that generates explicit hypotheses.This isclearly useful; itmeansthatallpossible hypothesesdonot need tobegenerated inadvanceofanysceneinvestigation.Thegeneratorcanbegivena preference equal tothatofthebesthypothesesthatitcan generate. Relaxation sometimes should determine areal number (such asthe slopeof aline) instead ofatruth value.Ageneratorlike techniquecanallowthemethod to refine the value of realvalued hypotheses. Basically, the idea is to assign a (Booleanvalued) generator hypothesis toarange ofvalues for the realvalue to be
Sec. 72.4 Scene Labeling and Constraint Relaxation

429

determined. Whenthisgenerator triggers,morehypothesesaregenerated togeta finerpartitionoftherange,andsoon. The enhancements to the linear programming paradigm of relaxation give someideaoftheflexibility ofthebasicidea,butalsorevealthatthemethodisnot at all cutanddried, and isstill open to basic investigation. Oneofthe questions aboutthemethodisexactlyhowtotakeadvantageofparallelcomputationcapabili ties. Each constraint and hypothesis can be given its own processor, but how should they communicate? Also, there seems little reason to suppose that the optimization problems for thisform ofrelaxation areanyeasier than theyare for anyother multidimensional search,sothemethod willencounter theusualprob lems inherent in such optimization. However, despite all these technical details andproblemsofimplementation, thelinearprogrammingparadigmfortherelaxa tioncomputation isacoherentformalization oftheprocess.Itprovidesarelatively "classical" context of results and taxonomy of problems [Hummel and Zucker 1980].
12.5 ACTIVE KNOWLEDGE

Activeknowledge systems [Freuder 1975] are characterized by the use of pro ceduresastheelementary unitsofknowledge (asopposed topropositionsordata baseitems,forinstance).Wedescribehowactiveknowledgemightwork, because it isalogical extreme of the procedural implementation of propositions. In fact, thisstyleofcontrolhasnotproveninfluential;somereasonsaregivenbelow. Active knowledge is notionally parallel and heterarchical. Many different procedurescanbeactiveatthesametimedependingontheinput.Forthisreason activeknowledgeismoreeasilyapplied tobeliefmaintenance than toplanning;it isverydifficult toorganizesequentialactivitywithinthisdiscipline.Basically,each procedureisresponsiblefora"chunk" ofknowledge,andknowshowtomanageit with respect to different visual inputs.Control in an active knowledge system is completely distributed. Active knowledge can also be viewed asan extension of the constraint relaxation problem; powerful procedures can make arbitrary de tailedtestsoftheconsistencybetweenconstraints. Each piece of active knowledge (program module) knows which other modulesitdependson,whichdependonit,whichitcancomplainto,andsoforth. Thusthechoiceof"what todonext" iscontained inthemodulesandisnotmade byanexteriorexecutive. WedescribeHYPER, aparticular activeknowledge system design whichil lustrates typicalpropertiesofactiveknowledge [Brown 1975].HYPER providesa less structured mechanism for construction and exploration of hypotheses than doesLPrelaxation. Usingprimitivecontrolfunctions ofthesystem,theusermay write programs for establishing hypotheses and for using the conclusions so reached.Theprogramsare"procedurallyembedded" knowledgeaboutaproblem domain (e.g. how events relate one to another, what may beconjectured or in ferredfromaclue,orhowonemightverifyahypothesis). When HYPER is in use on a particular task in adomain, hypotheses are created, orinstantiated, on the basisoflowlevel input, highlevel beliefs, orany
430 Ch.12 Inference

reason in between. The process of establishing the initial hypotheses leads toa propagation ofactivity (creation,verification, anddisconfirmation ofhypotheses). Activation patterns will generally vary with the particular task, in heterarchical fashion. Apriority mechanism can rank hypotheses in importance depending on thedatathatcontributetothem.Generally,theactionsthatoccurareconditioned by previous assumptions, the data, the success of methods, and other factors. HYPERcanbeused for planning applicationsandfor multistepvision processing aswellasinference (procedures then should generateparallel activity onlyunder tightcontrol).WeshallthusallowHYPERtomakeuseofacontextoriented data base(Section 13.1.1).Itwillusethecontextmechanismtoimplement "alternative worlds"inwhichtoreason.
12.5.1 Hypotheses

AHYPERhypothesisistheattributionofapredicatetosomearguments;itsname isalwaysoftheform (PREDICATE ARGUMENTS). Samplehypothesis names could be (HEADSHAPED REGION1), (ABOVE AB), (TRIANGLE (X1,Y1) (X2,Y2) (X3,Y3)).Ahypothesisisrepresentedasadatastructurewithfourcom ponents;thestatus, contents, context,andlinksofthehypothesis. ThestatusrepresentsthestateoftheHYPER'sknowledgeofthetruthofthe hypothesis;itmaybeT(rue),F(alse), (ineithercasethehypothesishasbeenesta blished) or P(ending). The contents are arbitrary; hypotheses are notjust truth valued assertions. The hypothesis wasasserted in the database context given in context. The linksof a hypothesis Hare pointers to other hypotheses that have askedthatHbeestablished becausetheyneedH'scontentstocompletetheirown computations.
12.5.2 HOWTO andSOWHAT Processes

Twoprocessesareassociated witheverypredicatePwhichappearsasthepredicate ofahypothesis.Theirnamesare(HOWTOP)and(SOWHATP).Inthemisem bedded the procedural knowledge ofthesystem which remains compiled in from oneparticular task toanother inaproblem domain. (HOWTOP) expresseshow to establish thehypothesis (Parguments). Itknowswhat other hypotheses must beestablished first, the computations needed toestablish (Parguments), andso forth. It hasabackwardchaining flavor. Similarly, (SOWHAT P) expresses the consequences of knowing P:what hypotheses could possibly now be established using the contents of (P arguments), what alternative hypotheses should beex ploredifthestatusof(Parguments) isF,andsoon.Thefeelinghereisofforward chaining.

12.5.3 Control Primitives

HYPER hypotheses interact through primitive controlstatements, which affect the investigation ofhypothesesandtheramification oftheirconsequences.Theprimi
Sec. 12.5 ActiveKnowledge 431

tivesareused inHOWTOand SOWHAT programs together withother general computations.Most primitiveshaveanargument called priority,which expresses the reliability, urgency, orimportance of theaction they produce, and isused to schedule processes in a nonparallel computing environment (implemented asa priorityjobqueue [Appendix2]).TheprimitivesareGET,AFFIRM,DENY,RE TRACT,FAIL,WONDERIF,andNUDGE. GET is to ascertain or establish the status and contents ofahypothesis. It takesahypothesisHandpriorityPRIasargumentsandreturnsthestatusandcon tentsofthehypothesis. IfH'sstatusisTorFatthetimeofexecution ofthestate ment, the status and contents arereturned immediately. Ifthe status isP (pend ing),orifHhasnotbeencreatedyet,thecurrentHOWTOorSOWHATprogram callingGET (callitCURPROG) isexited,the proper HOWTOjob (i.e.,theone thatdealswithH's predicate) isrunatpriorityPRIwithargument H,andalinkis plantedinHbacktoCURPROG.WhenHisestablished,CURPROG willbereac tivatedthroughthelinkmechanism. AFFIRM is to assert a hypothesis as true with some contents. AFFIRM(H,CONT,PRI) setsH's status toT, itscontents toCONT,activatesits linked programs and then executes the proper SOWHAT program on it. The newlyactivatedSOWHATprogramsareperformed withpriorityPRI. DENY is to assert that a hypothesis with some contents is false. DENY(H,CONT,PRI) islikeAFFIRMexceptthat noactivation though linksoc curs,andthestatusofHisofcoursesettoF. ASSUME is to assert a hypothesis as true hypothetically. ASSUME(H,CONT,PRI) uses the data base context mechanism tocreate anew context in which His AFFIRMED; the original context in which the ASSUME command isgiven ispreserved in the contextfieldofH.Hitself isstored intoa contextdependent itemnamedLASTASSUMED;thiscorrespondstoremember ingadecision pointinPLANNER.Byusingtheinformation inLASTASSUMED andtheprimitiveFAIL (seebelow),simplebacktrackingcantakeplaceinatreeof contexts. RETRACT(H) establishes as false a hypothesis that was previously AS SUMEd.RETRACTisalwayscarriedoutathighestpriority,ontheprinciplethatit isgoodtoleavethecontext ofamistakenassumptionasquicklyaspossible.Infor mation (includingthenameofthecontext beingexited) istransmittedbacktothe originalcontextinwhichHwasASSUMEdbypassingitbackinthefieldsofH. FAILjust RETRACTs thehypothesisthatisthevalueoftheitemLASTAS SUMEDinthepresentcontext. WONDERIF is to pass suggested contents to HOWTO processes for verification. It can be useful ifverifying avalue iseasier than computing it from scratch,andistheprimitivethatpassessubstantivesuggestions.WONDERIF(Hl, CONT, H2, PRI) approximates the notion "H2 wonders if HI has contents CONT." NUDGE is to wake up HOWTO programs. NUDGE(H,PRI) runs the HOWTO program on Hwith priority PRI.It isused to awaken hypotheses that might beableto useinformationjust computed. Typically itisaSOWHATpro

432

Ch.12 Inference

gramthatNUDGEsothers,sincetheSOWHATprogram isresponsibleforusing thefactthatahypothesisisknown.


12.5.4 Aspectsof Active Knowledge

The activeknowledge styleofcomputation raisesanumber ofquestions orprob lemsforitsusers. Ahypothesiswhosecontents mayattain alargerangecanbeestablished for somecontentsandthusexpressaperfectly goodfact (e.g.,thatagiven locationof an xray doesnot contain evidence for atumor) but suchafact isusually oflittle helpwhen wewant toreason about the predicate (about the location oftumors). The SOWHAT program for apredicate should bewritten soasto draw conclu sionsfrom such negative facts ifpossible,and from the conclusions endeavor to establishthehypothesisastrueforsomecontents.Usually,therefore, itwouldset thestatusofthehypothesisbacktoPandinitiateanewlineofattack,oratitsdis cretionabandontheeffort andstartanentirelynewlineofreasoning.
Priorities

A major worry with the scheme as described is that priorities are used to schedule running ofHOWTOand SOWHAT processes, nottoexpress the im portance (orsupposition value) ofthe hypotheses.The hypothesis being investi gatedhasnowaytocommunicatehowimportant itistotheprogramthatoperates onit,soitisimpossibletoaccumulateimportancethroughtime. Avery significant fact maylieignoredbecauseitwasgiventoaselfeffacing processthathadnoway ofknowingithadbeenhandedsomethingoutoftheordinary. Theobviousansweristomakeasupposition valueafield ofthehypothesis, likeits status orcontentsa hypothesis should begiven ameasure ofitsimpor tance.Thisvaluemaybeusedtocomputeexecutionprioritiesforjobsinvolvingit. Thissolutionisusedinsomesuccessful systems [Turner1974]. StructuringKnowledge Onehasawidechoiceinhowtostructure the "theory" ofacomplexprob lemintermsofHYPERprimitives,predicates,arguments,andHOWTOandSO WHATprocesses.ThesetofHOWTOandSOWHATprocessesspecify thecom pletetheoryofthetaskstobeperformed; HYPERencouragesonetoconsider the interrelations between widely separated and distinctsounding facts and conjec turesaboutaproblem,andthestructureitimposesonaproblemisminimal. Since HOWTO and SOWHAT processes make explicit references to one anotherviatheprimitives,theyarenot"modular" inthesensethattheycaneasily beplugged inand unplugged. IfHOWTOand SOWHAT processesare invoked bypatterns,insteadofbynames,someoftheedgeistaken offthiscriticism.Re movingaprimitivefrom aprogramcouldmodify drasticallytheavenuesofactiva tion,andtheconsequencesofsuchamodification aresometimeshardtoforeseein aprogramthatlogicallycouldberunninginparallel. Writingalargeandeffective programforonedomainmaynothelptowritea program for another domain.Newproblems ofsegmenting thetheoryintopredi cates, and quantifying their interactions via the primitives, setting up apriority
Sec. 72.5 Active Knowledge

433

structure, and soforth willoccur in the new domain, and itseemsquitelikely that littlemorethan basicutilityprogramswillcarryover between domains.

EXERCISES 12.1 In the production system example, write aproduction that specifies that bluere gions are sky using the opponents color notation. How would you now dealwith blue regions that are lakes (a) in the existing coloronly system; (b) in asystem whichhassurfaceorientation information? 12.2 This theorem was posed as a challenge for a clausal automatic theorem prover [Henschenetal.1980].Itisobviouslytrue:whatproblemsdoesitpresent? {[(3x)(\/y)(Pix) <=>[[Gx)Q(x)} <^>
{ [ ( 3 * ) ( V J 0 ( G ( X )

<^> Piy))) [(\/y)(P(y))])}<^>

<^> Q(y))]

< ^ [ [ ( 3 x ) P W ] <=^> KVyKQiy))]]}


12.3 ProvethattheoperatorofEq.(12.18)takesprobabilityvectorsintoprobabilityvec tors,thusderivingthereasonforEq.(12.19). 12.4 Verify (12.23). 12.5 Howdothec,yof(12.18)affect thelabeling?Whatistheirsemantics? 12.6 If events Aand Yalwayscooccur, then p(X, Y) = p(X) = p(Y). What isthe correlation inthiscase? IfA'and Knever cooccur,whatvaluesofp(X) and p(Y) produce aminimum correlation? IfA'and Fare independent, howisp(X, Y)re lated top(X) andp(Y) ?Whatisthevalueofthecorrelation ofindependent Jand

r?
12.7 CompleteTable12.3. 12.8 Useonly the labels ofFig. 12.9band ctocompute covariances in the manner of Table12.3.Whatdoyouconclude? 12.9 ShowthatEq.(12.29)preservestheimportantpropertiesoftheweightvectors. 12.10 Think ofsome rival normalization schemes toEq.(12.29) and describe theirpro perties. 12.11 Implement thelinearand nonlinear operators ofSection 12.4.3and 12.4.4and in vestigatetheirproperties. Includeyourideasfrom Exercise12.10. 12.12 Showacasethat the nonlinear operator ofEq.(12.29) assignsnonzero weightsto maximallyincompatiblelabels(thosewithrtJ = 1). 12.13 How can alinear programming relaxation such as the oneoutlined insec. 12.4.5 copewithfacesoredgesofthefeasible solutionsolid thatarenormaltothe prefer encedirection,yieldingseveralsolutionsofequal preference? 12.14 In Fig. 12.11, what (P,Q) solution is optimal if the preference vector is (1,4)? (4,1)? (1,1)? ( 1 , 0 ?
434 Ch.12 Inference

R E F E R E N C E S

AIKINS, J. S. "Prototypes and production rules: a knowledge representation for computer consulta tions." Ph.D. dissertation, Computer Science Dept., Stanford Univ., 1980. BAJCSY, R. and A. K. JOSHI."A partially ordered world model and natural outdoor scenes." In CVS, 1978. BARROW, H.G. and J. M.TENENBAUM. "MSYS:asystem for reasoning about scenes."Technical Note 121, AICenter, SRI International, March 1976. BRACHMAN, R. J. "On the epistemological status of semantic networks." In Associative Networks: Representation and Use of Knowledge by Computers, N. V. Findler (Ed.). New York: Academic Press, 1979,350. BROWN, C. M. "The HYPER system." DAI Working Paper 9, Dept. of Artificial Intelligence, Univ. Edinburgh,July 1975. BUCHANAN, B.G. and E. A. FEIGENBAUM. "DENDRAL and metaDENDRAL: their applications di mensions." ArtificialIntelligence11,2, 1978, 524. BUCHANAN, B.G. and T. M. MITCHELL. "Modeldirected learning of production rules." In Pattern DirectedInference Systems, D. A. Waterman and F. HayesRoth (Eds.). New York: Academic Press, 1978. COLLINS, A. "Fragments of a theory of human plausible reasoning." TheoreticalIssues in Natural Language Processing2,Univ.Illinoisat UrbanaChampaign, July 1978, 194201. DAVIS, R. and J. KING. " A n overview of production systems." AIM271, Stanford Al Lab, October 1975. DAVIS, L.S.and A. ROSENFELD. "Applications ofrelaxation labelling2.Springloaded template match ing."Technical Report 440,Computer ScienceCenter, Univ.Maryland, 1976. DELIYANNI, A. and R. A. KOWALSKI. "Logic and semanatic networks." Comm. ACM 22, 3, March 1979,184192. ERMAN, L. D.and V.R. LESSER. " A multilevel organization for problem solving using many, diverse, cooperatingsources ofknowledge." Proc, 4th IJCAI,September 1975,483490. FELDMAN, J. A.and Y. YAKIMOVSKY. "Decision theory and artificial intelligence: I. A semanticsbased region analyser." ArtificialIntelligence5,4, 1974, 349371. FIKES, R. E. "Knowledge representation in automatic planning systems." In PerspectivesonComputer Science,A.Jones (Ed).New York:Academic Press, 1977. FIKES, R. E. and N. J. NILSSON. "STRIPS: a new approach to the application of theorem proving to problemsolving." ArtificialIntelligence2, 3/4, 1971,189208. FREUDER, E. C. " A computer system for visual recognition using active knowledge." Ph.D. disserta tion, MIT, 1975. FREUDER, E. C. "Synthesizing constraint expressions." Comm. ACM 21. 11, November 1978, 958965. GARFINKEL, R.S.and G. L. NEMHAUSER. IntegerProgramming. NewYork:Wiley, 1972. GOMORY, R. E. "An algorithm for integer solutions to linear programs." Bull.American Mathematical Society 64,1968,275278. HARALICK, R. M. "The characterization of binary relation homomorphisms." International J. General Systems 4,1978, 113121. HARALICK, R.M.and J.S. KARTUS. "Arrangements, homomorphisms, and discrete relaxation." IEEE Trans.SMC 8, 8,August 1978,600612. HARALICK, R. M.and L.G. SHAPIRO. "The consistent labeling problem: Part I." IEEE Trans. PAMI I, 2, April 1979, 173184.

435

HARALICK, R.M.,L.S.DAVIS, andA.ROSENFELD. "Reduction operations forconstraint satisfaction." Information Sciences 14,1978, 199219. HAYES,P.J."Indefense oflogic."Proc, 5thIJCAI, August 1977,559565. HAYES, P.J."Naive physics:ontology forliquids." Working paper. Institute forSemantic andCogni tiveStudies,Geneva, 1978a. HAYES, P.J."The naive physicsmanifesto." Working paper, Institute forSemanticandCognitive Stu dies, Geneva, 1978b. HAYES, P.J."The logicofframes." TheFrameReader. Berlin:DeGruyter, inpress,1981. HENDRIX, G.G. "Encoding knowledge inpartitioned networks." InAssociative Networks: Representa tionandUseof KnowledgebyComputers,N.V.Findler (Ed.). New York: Academic Press, 1979, 5192.
HENSCHEN, L., E. LUSK, R. OVERBEEK, B. SMITH, R. VEROFF, S. WINKER, and L. Wos. "Challenge

Problem 1."SIGARTNewsletter 72,July 1980,3031. HERBRAND, J."Recherches surlatheorie delademonstration." TravauxdelaSoci'et'edesSciences etdes Lettresde Varsovie, ClasseIII,SciencesMath'ematiquesetPhysiques,33,1930. HEWITT, C."Description and theoretical analysis (using schemata) ofPLANNER" (Ph.D. disserta tion).AITR258,AILab,MIT, 1972. HINTON, G. E."Relaxation anditsrole in vision." Ph.D. dissertation, Univ. Edinburgh, December 1979. HUMMEL, R.A.andS.W.ZUCKER. "Onthefoundations ofrelaxation labelling processes." TR807, Computer VisionandGraphics Lab, Dept.ofElectrical Engineering, McGill Univ., July 1980. KOWALSKI, R.A."Predicate logicasaprogramming language." Information Processing74. Amsterdam: NorthHolland, 1974,569574. KOWALSKI,R.A.Logicfor ProblemSolving.New York:ElsevierNorthHolland (AISeries), 1979.
LINDSAY, R. K., B.G. BUCHANAN, E. A. FEIGENBAUM, and J. LEDERBERG. ApplicationsofArtificialIntelli

gencetoChemistry:TheDENDRAL Project. New York:McGrawHill, 1980. LOVELAND, D." A linear format for resolution." Proc, IRIA 1968 Symp. onAutomatic Demonstra tion, Versailles,France.New York:SpringerVerlag, 1970. LOVELAND, D.Automated TheoremProving:ALogicalBasis. Amsterdam."NorthHolland,1978. MCCARTHY, J. "Circumscription inductiona wayofjumping toconclusions." Unpublished report, Stanford AILab, 1978. MCCARTHY, J.andP.J. HAYES. "Some philosophical problems from thestandpoint ofartificial intelli gence."InM/4,1969. MCDERMOTT, D."The PROLOG phenomenon." SIGART Newsletter 72,July 1980, 1620. MENDELSON,E.IntroductiontoMathematical Logic.Princeton, NJ:D.Van Nostrand, 1964. MINSKY, M.L." Aframework forrepresenting knowledge."InPCV,1975. NEWELL, A.,J.SHAW,andH.SIMON. "Empirical explorationsofthe logictheory machine."In Comput ersandThought,E.Feigenbaum andJ.Feldman (Eds.).New York: McGrawHill, 1963. NILSSON, N.J.ProblemSolvingMethods inArtificialIntelligence.New York:McGrawHill,1971. NILSSON, N.J.PrinciplesofArtificialIntelligence.PaloAlto,CA:Tioga, 1980. REITER, R."Onreasoning bydefault." TheoreticalIssues inNatural Language Processing2, Univ. Illi noisatUrbanaChampaign, July 1978,210218. ROBINSON, J.A." A machineoriented logic based ontheresolution principle."J.ACM12, 1,January 1965,2341. ROSENFELD, A.,R.A.HUMMEL andS.W.ZUCKER. "Scene labelling byrelaxation operations." IEEE Trans.SMC6, 1976,420.

436

Ch. 12 Inference

RYCHNER, M. " A n instructable production system: basic design issues." In Pattern DirectedInference Systems, D. A.Waterman and F.HayesRoth (Eds.).NewYork:Academic Press, 1978. SHORTLIFFE, E. H. ComputerBased Medical Consultations: MYCIN. New York: American Elsevier, 1976. SLOAN, K. R. "World model driven recognition of natural scenes." Ph.D. dissertation, Moore School ofElectrical Engineering, Univ.Pennsylvania, June 1977. SLOAN, K. R.and R. BAJCSY. "World model driven recognition of outdoor scenes." TR40, Computer ScienceDept., Univ.Rochester, September 1979. SUSSMAN,G.J.and D. MCDERMOTT. "Why conniving isbetter than planning." AIMemo 255,AILab, MIT, 1972. TURNER, K.J. "Computer perception ofcurved objects usingatelevision camera." Ph.D. dissertation, School ofArtificial Intelligence, Univ.Edinburgh, 1974. WARREN, H. D., L. PEREIRA, and F. PEREIRA. "PROLOG: The language and its implementation com pared with LISP." Proc, Symp. on Artificial Intelligence and Programming Languages, SIGPLAN/SIGART, 1977;SIGPLANNotices 12,8,August 1977, 109115. WATERMAN, D. A.and F. HAYESROTH (Eds.).PatternDirectedInferenceSystems. New York: Academic Press, 1978. WINOGRAD, T. "Extended inference modes inreasoning bycomputer systems." Proc, Conf.on Induc tiveLogic,Oxford Univ., August 1978. ZADEH, L."Fuzzy sets."Information andControl8, 1965,338353. ZUCKER, S. W. "Relaxation labelling and the reduction of local ambiguities." Technical Report 451, Computer Science Dept., Univ. Maryland, 1976.

References

437

GoalAchievement
GoalAchievementandVision
Goalsandplansareimportantforvisualprocessing.

13

Someskilledvisionactuallyislikeproblemsolving. Visionforinformation gatheringcanbepartofaplannedsequenceofactions. Planningcanbeauseful andefficient waytoguidemanyvisualcomputations, eventhosethatarenotmeanttoimply"conscious"cognitiveactivity. Theartificial intelligenceactivity often calledplanningtraditionally hasdealt with"robots" (realormodeled) performingactionsintherealworld.Planninghas severalaspects. Avoidnasty"subgoalinteractions"suchasgettingpaintedintoacorner. Findtheplanwithoptimalproperties (leastrisk,leastcost,maximized "good ness"ofsomevariety). Deriveasequenceofstepsthatwillachievethegoalfromthestartingsituation. Remembereffective actionsequencessothattheymaybeappliedinnewsitua tions. Apply planning techniques to giving advice, presumably by simulating the advisee'sactionsandmakingthenextstepfrom thepointtheyleft off. Recoverfromerrorsorchangesinconditionsthatoccurinthemiddleofaplan. Traditionalplanningresearchhasnotconcentratedonplanswith information gathering steps, such asvision. The main interest in planning research has been theexpensiveandsometimesirrevocablenatureofactionsintheworld.Ourgoalis togiveaflavor oftheissuesthatarepursued inmuch moredetail intheplanning
438

literature [Nilsson 1980;Tate1977;Fahlman 1974;FikesandNilsson 1971;Fikes etal.1972a;1972b;Warren 1974;Sacerdoti 1974;1977;Sussman1975]. Planning concerns an active agent and its interaction with the world. This conception doesnot fitwiththe ideaofvisionasapassiveactivity.However, one claim of this book isthat much of vision isaconstructive, active, goaloriented process, replete with uncertainty. Then amodel of vision asasequence ofdeci sionspunctuatedbymoreorlesscostlyinformation gatheringstepsbecomesmore compelling.Visionoften isasequential (recursive,cyclical) processofalternating information gathering and decision making. This paradigm is quite common in computervision [Shirai1975;Ballard 1978;Mackworth 1978;Ambleretal.1975]. However,theformalization oftheprocessintermsofminimizingcostormaximiz ing utility isnot so common [Feldman and Sproull 1977; Ballard 1978; Garvey 1976]. This section examines the paradigms of planning, evaluating plans with costsandutilities,andhowplansmaybeappliedtovisionprocessing.

13.1 SYMBOLIC PLANNING

Inartificial intelligence,planning isusually aform ofproblemsolving activityin volving aformal "simulation" ofaphysical world. (Planning, theorem proving, and statespace problem solving are all closely related.) There is an agent (the "robot") whocanperform actionsthattransform thestateofthesimulatedworld. Therobotplanner isconfronted withaninitialworldstateandasetofgoalstobe achieved.Planningexploresworldstatesresultingfrom actions,andtriestofinda sequence ofactions that achieves the goals.The states can bearranged in atree withinitialstateastheroot,andbranchesresultingfrom applyingdifferent actions inastate.Planningisasearchthrough thistree,resultinginapathorsequenceof actions,from theroottoastateinwhichthegoalsareachieved. Usuallythereisa metricoveractionsequences;thesimplest isthattherebeasfewactionsaspossi ble. Moregenerally (Section 13.2),actionsmay beassigned somecost which the plannershouldminimize.
13.1.1 RepresentingtheWorld

Thissectionillustratesplanningbrieflywithaclassicalexampleblockstacking.In onesimpleform therearethreeblocksinitiallystackedasshownontheleft inFig. 13.1,tobestackedasshown. Thistaskmaybe"formalized" [Bundy1978]usingonlythesymbolicobjects Floor,A,B,and C.(Aformalization suitableforarealautomated plannermustbe muchmorecarefulaboutdetailsthanweshallbe).Assumethatonlyasingleblock canbepickedupatatime.NecessarypredicatesareCLEAR(X)whichistrueifa blockmaybeputdirectlyon^TandwhichmustbetruebeforeZmaybepickedup, and ONQf,Y),which istrue ifXisresting directly on Y.Let usstipulate that the FloorisalwaysCLEAR, butotherwiseifONQT,Y)istrue, Yis notCLEAR.Then theinitialsituationinFig.13.1ischaracterizedbythefollowingassertions.

Sec. 73.7

Symbolic Planning

439

Floor Initial stacks

Floor Goalstack

F i g . 13.1

A simple block stacking task.

INITIALSTATE:ON(C,A),ON(A,Floor),ON(B,Floor), CLEAR(C),CLEAR(B),CLEAR(Floor) Thegoalstateisoneinwhichthefollowingtwoassertionsaretrue. GOALASSERTIONS: ON(A,B),ON(B,C) With onlytheserules,the formalization ofthe block stackingworld yieldsavery "loose" semantics. (The task easily translates to sorting integers with some re strictionsonoperations,ortothe"sedation" taskofarrangingblockshorizontally inorderofsize,orahostofothers.) Actionstransform thesetofassertionsdescribingtheworld.Forproblemsof realisticscale,therepresentation ofthetreeofworldstatesisapractical problem. Theissueisoneofmaintaining severalcoexisting "hypothetical worlds"andrea soningaboutthem.Thisisanotherversionoftheframe problemdiscussed insec. 12.1.6.Onewaytosolvethisproblemistogiveeachassertion anextra argument, namingthehypotheticalworld (usuallycalledasituation [Nilsson 1980;McCarthy andHayes1969])inwhichtheassertionholds.Thenactionsmapsituationstositu ationsaswellasintroducingandchangingassertions. Anequivalentwaytothinkabout (andimplement) multiple,dependent,hy potheticalworldsiswithatreestructured contextorienteddatabase.Thisideaisa generalonethatisuseful inmanyartificial intelligenceapplications, notjustsym bolic planning. Such data bases are included in many artificial intelligence languagesand appear inother more traditional environments aswell.Acontext oriented data baseactslikeatree ofdata bases;atanynodeofthetreeisasetof assertionsthatmakesupthedatabase.Anewdatabase(context) maybespawned from anycontext (database) inthetree.Allassertionsthataretrueinthespawn ing (ancestor) context are initially true in the spawned (descendant) context. However, newassertionsadded inanycontext ordeleted from itdonotaffect its ancestor.Thus bygoing back to the ancestor, alldata basechangesperformed in thedescendentcontextdisappear. Implementing such adata baseisaninteresting exercise. Copyingallasser tions toeach newcontext ispossible,but verywasteful ifonlyafewchangesare madeineachcontext.Thefollowing mechanism ismuch moreefficient. Theroot orinitial context hassome set ofassertions init, and eachdescendant context is merelyanaddlistofassertionstoaddtothedatabaseandadeletelistofassertions todelete.Thentoseeifanassertionistrueinacontext,dothefollowing. 1. Ifthecontextistherootcontext,lookup"asusual." 2. Otherwise, iftheassertion isonthe addlistofthiscontext, return true. Ifthe assertionisonthedeletelistofthiscontext,returnfalse.
4 4 0 Ch. 13 Goal Achievement

3. Otherwise,recursivelyapplythisproceduretotheancestorofthiscontext. In ageneral programming environment, contexts have names,and there is the facility ofexecuting procedures "in" particular contexts, movingaround the context tree,and soforth. However, inwhatfollows, only the ability to look up assertionsincontextsisrelevant.
13.1.2 Representing Actions

Representanactionasatriple. ACTION::= [PATTERN,PRECONDITIONS,POSTCONDITIONS]. Herethepatterngivesthenameoftheactionandnamesfortheobjectswithwhich itdealsits "formal parameters." Preconditions andpostconditions mayuse the formal variables of the pattern. In asense, the preconditions and postconditions are the "body" of the action, with subroutinelike "variable bindings" taking placewhentheaction istobeperformed. Thepreconditionsgivetheworldstates inwhichtheaction maybeapplied.Herethepreconditionsareassumedsimplyto bealist ofassertions allofwhich must betrue.The postconditions describe the world state that results from performing the action. The contextoriented data baseofhypotheticalworldscanbeusedtoimplementthepostconditions. POSTCONDITIONS::= [ADDLIST,DELETELIST]. Anactionisthenperformed asfollows. 1. Bindthepattern variablestoentitiesintheworld,thusbindingtheassociated variablesinthepreconditionsandpostconditions. 2. Ifthepreconditionsaremet (theboundassertionsexist inthedata base),do thenextstep,elseexitreportingfailure. 3. Delete the assertions in the delete list, add those in theadd list, andexitre portingsuccess. HereistheMoveactionforourblockstackingexample.
MoveObjectXfrom YtoZ PRECONDITIONS DELETELIST CLEAR(X) CLEAR(Z) ON(X,Y) ON(X,Y) CLEAR(Z)

PATTERN Move(X,Y,Z)

ADDLIST ON(X,Z) CLEAR(Y)

HereX, Y,andZareallvariablesboundtoworldentities. Intheinitialstate ofFig.13.1,Move(C,A,Floor) bindsZto C, Yto A, Zto Floor,andtheprecondi tionsaresatisfied;theactionmayproceed. However,noticetwothings.
Sec. 13.1 Symbolic Planning 4 4 1

1. The action given above deletes the CLEAR(Floor) assertion that always should be true. One must fix this somehow; putting CLEAR(Floor) in the addlistdoesthejob,butisalittleinelegant. 2. What about an action like Move(C,^(,C)? It meets the preconditions, but causestroublewhentheaddanddeletelistsareapplied.Onefixhereistokeep inthedatabase ("world model") asetofassertionssuchasDifferent (A,B), Different(A,Floor), ...,and toaddassertionssuchasDifferent {X,Z) tothe preconditionsofMove. Such housekeeping chores and detailsofaxiomatization areinherent inap plying basically syntactic,formal solution methods to problem solving. For now, letusassume thatCLEAR(Floor) isnever deleted, andthatMove(A',Y,Z) isap pliedonlyifZisdifferent fromXand Y.
13.1.3 Stacking Blocks

Inthe blockstacking example, thegoalistwosimultaneousassertions, ON(A,B) andON(i?,C).Onesolution method proceedsbyrepeatedly pickingagoaltowork on,findinganoperatorthatmovesclosertothegoal,andapplyingit.Inthiscaseof only one action the question is how to apply itwhat to move where. This is answered bylookingatthepostconditionsoftheactioninthelightofthegoal.The reasoningmightgolikethis:ON(Z?,C)canbemadetrueifXis BandZisC.Thatis possibleinthisstateif YisA;allpreconditionsaresatisfied, andthegoalON(i?,C) canbeachievedwithoneaction. Partoftheworldstate (orcontext) treetheplanner mustsearchisshownin Fig.13.2,wherestatesareshowndiagrammaticallyinsteadofthroughsetsofasser tions.NoticethefollowingthingsinFig.13.2. 1. TryingtoachieveON(B,C)firstisamistake (Branch1). 2. Trying to achieve ON(A,B) first is also a mistake for less obvious reasons (Branch2). 3. Branches 1and 2show "subgoal interaction." Thegoalsasstatedarenot in dependent. Branch 3must begenerated somehow,either through backtrack ingorsomeintelligentwayofcopingwithinteraction.Itwillneverbefoundby the singleminded approach of (1) and (2). However, ifON(C,Floor) were oneofthegoalassertions,Branch3couldbefound. Clearly,representingworldandactionsisnotthewholestoryinplanning.In telligentsearchofthecontextisalsonecessary.Thissearchinvolvessubgoalselec tion, action selection, and action argument selection. Bad choicesanywhere can mean inefficient or looping action sequences, or the generation of impossible subgoals. "Intelligent" search impliesametalevel capability:theability ofapro gramtoreasonaboutitsownplans."Plancritics"areoften apartofsophisticated planners;oneoftheirmainjobsistoisolateandrectify unwanted subgoalinterac tion [Sussman1975].
442
Ch. 13 Goal Achievement

c
A B

(jWoveKB,C,f T )

c
A B

Branch1

E 0 0
Bran: h 2

B C

Branch3 Fig. 13.2 Astate treegenerated in planning how loslack three blocks.

Intelligentchoiceofactionsisthecruxofplanning,andisamajorresearchis sue.Several avenues havebeen andarebeingtried.Perhapssubgoals maybeor deredbydifficulty andachieved inthatorder. Perhapsplanningshould proceedat variouslevelsofdetail (likemultiresolutionimageunderstanding),wherethestra tegicskeleton ofaplan isderived without details, then the detailsarefilledinby applyingtheplannerinmoredetailtothesubgoalsinthelowresolutionplan.
Sec. 73.7 Symbolic Planning 4 4 3

13.1.4 TheFrameProblem

All planning is plagued by aspects of theframe problem(introduced in Section 12.1.6). 1. Itisimpractical (andboring) towritedowninanactionallthethingsthatstay thesamewhenanactionisapplied. 2. Similarly,itisimpracticaltoreassertinthedatabaseallthethingsthatremain truewhenanactionisimplied. 3. Often an action has effects that cannot be represented with simple add and deletelists. The add and delete list mechanism and the contextoriented data base mechanism addressed thefirsttwoproblems. The last problem ismore trouble some. Addanddeletelistsaresimpleideas,whereastheworldisacomplexplace.In manyinterestingcases,theaddanddeletelistsdepend onthecurrentstateofthe worldwhentheactionisapplied.ThinkofactionsTURNBYU) andMOVEBY(Z) inaworldwhereorientation andlocationareimportant.Theorientation andloca tionafter anactiondependnotjustontheactionbutonthestateoftheworldjust beforetheaction. Again, theactionmayhaveverycomplexeffects iftherearecomplexdepen dencies between world objects. Consider the problem ofthe "monkey and bana nas," wherethemonkeyplanstopushtheboxunder thebananasandclimbonit toreachthem (Fig.13.3). Implementation ofrealistically powerful addanddelete listsmayinfactrequirearbitraryamountsofdeductionandcomputation.

K
\

4 \>
/ / /
444

V
Fig. 13.3 Actions may havecomplex effects.
Ch. 73 Coal Achievement

This quick precis of symbolic planning does not address many "classical" topics,suchaslearningorrememberingusefulplans. Alsonotdiscussedare:plan ning at varying levels of abstraction, plans with uncertain information, or plans withcosts.TheinterestedreadershouldconsulttheReferences for more informa tion. The next section addresses plans with costs since they are particularly relevanttovision;someoftheotherissuesappearintheExercises.
13.2 PLANNINGWITH COSTS

Decisionmakingunder uncertainty isanimportant topicinitsownright, beingof interest topolicymakersandmanagers [Raiffa 1968].Analytictechniquesthatcan derive thestrategy with the "optimal expected outcome" or"maximal expected utility"canbebasedonBayesianmodelsofprobability. In [FeldmanandSproull 1977]thesetechniquesareexplored inthecontext ofaction planning for realworld actionsand vision. As an example of the tech niques,theyareusedtomodelanextendedversionofthe"monkeyandbananas" problemofthelastsection,withmultipleboxesbutwithoutthemaddeningpulley arrangement.Intheextendedproblem,thereareboxesofdifferent weightswhich mayormaynotsupport the monkey,andhecanapplytests (e.g.,vision) atsome cost to determine whether they are usable. Pushing weighted boxes costs some effort, and the gratification of eating the bananas is "worth" only some finite amount ofeffort. Thisextended setofconsiderations ismore likeeveryday deci sion making in the number offactors that need balancing, inthe uncertainty in herent in the universe, and in the richness ofapplicable tests.Infact, one might maketheclaimthathuman beingsalways"maximize theirexpected utility,"and ifone knew aperson's utility functions, hisbehavior would become predictable. The more intuitive claim that humans beings plan only asfar as "sufficient ex pectedutility"canbecastasamaximizationoperationwithnonzero"costofplan ning." Thesequentialdecisionmakingmodelofplanningwiththegoalofmaximiz ing the goodness of the expected outcome wasused in atravel planner [Sproull 1977].Knowledge ofschedules and costsofvariousmodes oftransportation and theattendantriskscouldbecombinedwithpersonalprejudicesandpreferences to produceanitinerarywiththe maximum expected utility. Ifunexpected situations (canceledflights, say)aroseenroute,replanningcouldbeinitiated;thisincremen talplanramification isanaturalextensionofsequentialdecisionmaking. Thissectionisconcerned withmeasuringtheexpectedperformance ofplans usingasinglenumber. Although onemightexpectonenumber tobeinadequate, thecentral theorem ofdecision theory [DeGroot 1970]showsessentially thatone number isenough. Using anumerical measure ofgoodness allows comparisons betweennormally incomparable concepts tobemadeeasily.Quitefrequently nu merical scoresaredirectly relevant to the issuesatstakeinplanning, sotheyare notobnoxiouslyreductionistic.Decisiontheorycanalsohelpintheprocessofap plyingaplanthebasicplanmaybesimple,butitsapplicationtotheworldmaybe complex,intermsofwhentodeclarearesultestablishedoranaction unsuccessful. Thedecisiontheoreticapproachhasbeenusedinseveralartificial intelligenceand
Sec. 13.2 Planning withCosts 445

vision programs [Feldman and Yakimovsky 1974; Bolles 1977;Garvey 1976;Bal lard 1978;Sproull 1977].
13.2.1 Planning,Scoring,andTheir Interaction

For didactic purposes, the processes of plan generation and plan scoring are con sidered separately. In fact, these processes may cooperate more or less intimately. Theplanner produces "sequences"ofactionsfor evaluation bythescorer. Eachac tion (computation, information gathering, performing a realworld action) has a cost, expressing expenditure of resources, or associated unhappiness. An action hasaset ofpossible outcomes, ofwhich onlyonewillreallyoccurwhen theaction is performed. Agoalis astate ofthe world with an associated "happiness" or utility. For thepurposes ofuniformity and formal manipulation, goalsaretreated as (null) actionswith nooutcomes, and negative utilitiesareusedtoexpresscosts.Then the plan hasonlyactionsinit;they maybearranged inastrictsequence, orbeinloops, beconditional onoutcomesofother actions,andso forth. The scoringprocess evalutes the expected utility of a plan. In an uncertain world, aplan prior toexecution has onlyan expected goodnesssomething might gowrong.Suchascoringprocesstypically isnot ofinterest tothose whowould use planners to solve puzzles or do proofs; what is interesting is the result, not the effort. But plans thatare "optimal" insome sense aredecidedly ofinterest in real world decision making. In a vision context, plans are usually useful only if they canbeevaluated forefficiency and efficacy. Scoring can take place on "complete" plans, but itcan also be used to guide plan generation. The usual artificial intelligence problemsolving techniques of progressive deepening search and branchandbound pruning may be applied to planning ifscoring happens asthe plan isgenerated [Nilsson 1980].Scoring can be used to assess the cost of planning and to monitor planning horizons (how far ahead to look and how detailed to make the plan).Scoring will penalize plans that loop without producing results. Plan improvements, such as replanning upon failure, can be assessed with scores, and the contribution of additional steps (say for extra information gathering) can be assessed dynamically by scoring. Scoring can be arbitrarily complex utility functions, thus reflecting such concepts as "risk aversion" andnonlinear valueofresources [Raiffa 1968].
13.2.2 ScoringSimplePlans

Scoringandan Example A simpleplan isatree ofnodes (there are no loops).The nodes represent ac tions (andgoals).Outcomes arerepresented bylabeledarcsinthetree.Aprobabil ity of occurrence isassociated with each possible outcome; since exactly one out come actually occurs per action, the probabilities for the possible outcomes of any actionsum to unity. The scoreofaplan isitsexpectedutility.Theexpected utility ofanynode isre cursively defined as its utility times the probability of reaching that node in the
4 4 6 Ch. 13 Goal Achievement

plan,plustheexpected utilitiesoftheactionsatits (possible)outcomes. Thepro bability ofreaching any "goal state" inthe planisthe product ofprobabilitiesof outcomesformingapathfromtherootoftheplantothegoalstate. Asanexample,considertheplanshowninFig. 13.4.IftheplanofFig.13.4

Tablelocated

Tablenotlocated

P13

PU

Findtelephone shape

Donot findtelephone shape

Telephone there

Notelephone there

Fig. 13.4 This plan to find a telephone in an office scene involves finding a table first and looking there in more detail. The actions and outcomes are shown. The probabilities of outcomes are assigned symbols (P10, etc.). Utilities (denoted by U:) are given for the individual actions. Note that negative utilities may be considered costs. In this example, decisionmaking takes no effort, image processing costs vary, and there are various penal ties and rewards for correct and incorrect finding of the telephone.
Sec. 13.2 Planning with Costs 447

has probabilities assigned to its outcomes, we may compute itsexpected utility. Figure 13.5shows the calculation. The probability of correctly finding the tele phoneis0.34,andtheexpectedutilityoftheplanis433. Although thegeneration ofaplanmaynotbeeasy,scoringaplanisatrivial exerciseoncetheprobabilitiesandutilitiesareknown.Inpractice,theassignment ofprobabilitiesisusuallyasourceofdifficulty. Thefollowing isanexample using

0.95

Findtelephone shape

Donot find telephone shape

Telephone there

No telephone there

Fig. 13.5 As for Fig. 13.4. U: gives the utility of each action. E(U): gives the expected utillity of the action, which depends on the outcomes below it. Values for outcome proba bilities aregiven on the outcome arcs.
448 Ch. 13 Coal Achievement

the telephonefinding plan and some assumptions about thetests. Different as sumptionsyielddifferent scores. ComputingOutcomeProbabilities:AnExample ThisexamplereliesheavilyonBayes'rule: P(B\A)P(A) =P(AAB) =P(A\B)P(B). (13.1)

Let usassumeaspecific apriori probability that the scene containsatele phone. P\ =apriori probability ofTelephone (13.2) Alsoassumethatsomethingisknownaboutthebehaviorofthevarioustestsinthe presenceofwhat they are looking for. This knowledge may accrue from experi ments tosee how often thetable test found tables when telephones (ortables) wereandwerenotpresent. Letusassumethatthefollowingareknownprobabili ties. Pi=P(table located|telephone inscene) Ps =P(tablelocated|no telephoneinscene) P2=aprioriprobabilityofnotelephone= 1P\ Pi, =P(notablelocated|telephoneinscene) = 1P3 P(,=P(notablelocated|notelephoneinscene)= 1 P5 Similarlywiththe"shapetest"fortelephones:assumeprobabilities P1=P(telephoneshapelocated|telephone) P9= P(telephoneshapelocated|notelephone) with Ps=lP7, Pl0=lP9 (13.10) asabove. Thereareafewpointstomake:First,itisnotnecessarytoknowexactlythese probabilities inorder toscore theplan; one could use related probabilities and Bayes'rule.Otheruseful probabilitiesareoftheform P(telephone|telephoneshapelocated). In some systems [Garvey 1976] these are assumed tobe available directly. This section shows howtoderive them from known conditional probabilities that describethebehaviorofdetectorsgivencertainscenephenomena. Second, notice the assumption thatalthough both the outcome ofthe table testandtheshapetestdependonthepresenceoftelephones,theyaretakentobe independentofeachother.Thatis,havingfoundatabletellsusnothingaboutthe likelihoodof inding telephoneshape.Independenceassumptionssuchasthisare f a
Sec. 13.2 Planning with Costs 4 4 9

(13.3) (13.4) (13.5) (13.6) (13.7) (13.8) (13.9)

Eitherthereisatelephoneorthereisnot,andatableislocatedoritisnot,so

useful tolimitcomputations anddatagathering,butcanbesomewhat unrealistic. Toaccountforthedependence,onewouldhavetomeasuresuchquantitiesas P(telephoneshapefound|table located). Nowtocomputesomeoutcomeprobabilities:Considertheprobability Pu =P(tablelocated) Letuswrite TLforTableLocated TNLforTableNotLocated. A table may be located whether or not atelephone is in the scene. In terms of knownprobabilities,Bayes'ruleyields Pu^PiPi +PsPi Then P 12 = />(TNL) = \~Pn CalculatingP 13showsaneattrickusingBayes'Rule: Pu =P(telephone|TNL) (13.14) That is, P13is the probability that there is atelephone in the scene given that searchforatablewasunsuccessful.Thisprobabilityisnotknowndirectly,but
p 13

(13.11)

(1312) (13.13)

_ P(telephoneandTNL) P(TNL) P(TNLandtelephone) Pn = [P(TNL|telephone)P(telephone)] Pn

, J 315N

[P4P\] Pn Then,ofcourse P u l P n (13.16) Reasoning in this way using the conditional probabilities and assumptions abouttheirindependenceallowsthecompletionofthecalculationofoutcomepro babilities (seetheExercises). Onepossiblyconfusing pointoccursincalculationof P ] 5 , whichis P\s= P(telephoneshapefound|tablelocated) (13.17) Byassumption,theseeventsareonlyindirectlyrelated.Bythesimplifyingassump tionsofindependence, theshapeoperator andthetableoperator areindependent intheiroperation. (Such assumptions might befalse iftheyusedcommon image processingsubroutines,forexample.) Ofcourse,theprobabilityofsuccessofeach
4 5 0 Ch. 13 Coal Achievement

dependsonthepresenceofatelephoneinthescene. Thereforetheir performance islinkedinthefollowingway(seetheExercises).(WriteTSLforTelephoneShape Located.) P15= P(TSL|TL)P(TSL|telephone)P(telephone|TL) +P(TSL|notelephone)Pino telephone|TL)


13.2.3 ScoringEnhancedPlans

(13.18)

The plansofSection 13.2.2werecalled "simple" because oftheir tree structure, completeorderingofactions,and thesimpleactionsoftheir nodes.Witharicher output from thesymbolicplanner, theplansmayhavedifferent structure.Forex ample, there maybe ORnodes,anyoneofwhosesonswillachievetheactionat thenode;ANDnodes,allofwhichmustbesatisfied (inanyorder)fortheactionto besatisfactorily completed;SEQUENCEnodes,whichspecify asetofactionsanda particular order in which to achieve them. The plan may have loops, shared subgoalstructure,orgoalsthatdependoneachother.Howenhancedplansarein terpreted and executed depends on the scoring algorithms, the possibilities of parallel execution, whether execution and scoring are interleaved, and so forth. This treatment ignores parallelism and limits discussion to expanding enhanced plansintosimpleones. Itshould beclearhowtogoaboutconvertingmanyoftheseenhanced plans to simple plans.For instance, sequence nodes simply go toaunique path ofac tions. Alternatively, depending on assumptions about outcomes ofsuch actions (say whether they can fail), they may be coalesced into one action, aswas the "threshold,findblobs,andcomputeshapes"actioninthetelephonefinding plan. Rather more interesting are the OR and AND nodes, the order of whose subgoals isunspecified. Each such nodeyieldsmany simple plans,depending on the order in which the subgoals areattacked. Onewayto scoresuch aplan isto generateallpossiblesimpleplansandscoreeachone,butperhapsitispossibleto dobetter. Forexample,loopsandmutualdependenciesinplanscanbedealtwith invariousways.Aloopcanbeanalyzedtomakesurethatitcontainsanexit (such asabranchofanORnode that canbeexecuted). Onecan makeadhocassump tionsthat thecostofexecution isalwaysmore than the cost ofplanning [Garvey 1976],and scorethe loopbyitsexecutable branch.Another ideaistoplanincre mentally with a finite horizon, expanding the plan through some progressive deepening, heuristic search, or pruning strategy. The accumulated cost ofgoing aroundaloopwillsoonremoveitfromfurther consideration. Recall (Figs.13.4and 13.5) that theexpected utilityofaplanwasdefined as thesumoftheutilityofeachleafnodetimestheprobabilityofreachingthatnode. However, the utilitiesneed notcombine linearly inscoring.Different monotonic functions of utility express such different conceptions as "aversion to risk" or "gambling addiction." These considerationsarerealones,and nonlinear utilities aretherulerather than theexception. Forinstance,thevalueofmoneyisnotori ouslynonlinear.Manypeoplewouldpay$5for anevenchancetowin$15;notso manypeoplewouldpay$5,000foranevenchancetowin$15,000.
Sec. 13.2 Planning with Costs 4 5 1

Onecommon waytocomputescoresbasedonutilitiesisthe "cost/benefit" ratio.This,intheform "cost/confidence" ratio,isusedbyGarveyinhisplanning vision system. This measure isexamined in Section 13.2.5;roughly, his "cost" wasthe effort inmachine cycles toachieve goals, and his "confidence" approxi matedtheprobabilityofagoalachievingthecorrectoutcome.Theutilityofcorrect outcomeswasnotexplicitlyencodedinhisplanner. Sequentialplanelaborationorpartialplanelaborationcanbeinterleavedwith execution and scoring. Most practical planning is done in interaction with the world, and the plan scoring approach lends itself well to assessing such interac tions. In Section 13.2.5 considers a planning vision system that uses enhanced plansandalimitedreplanningcapability. Athornyproblem fordecisionmakingistoassessthecostofplanningitself. Theplanningprocessisgivenitsownutility (cost),andiscarriedonlyoutasfaras isindicated. Ofcourse,theproblemisingeneralinfinitely recursive,sincethereis alsothe cost ofassessing thecost ofplanning, etc. If, however, there isaknown upperboundontheutilityofthebestachievableplan,thenitisknownthat infinite planningcouldnotimproveit.Thissortofreasoningisweakerthanthatneededto give the expected benefits of planning; it measures only the cost and maximum valueofplanning. Another more advanced consideration is that the results ofactions can be continuous and multidimensional, and discrete probabilities can be extended to probability distribution functions. Such techniques can reflect the precision of measurements. An obviously desirable extension to a planner is a "learner," that can abstractrulesforactionapplicabilityandremembersuccessful plans.Oneapproach wouldbetoderiveandrememberrangesofplanningparametersarisingduringex ecution;arangecouldbeassociatedwitharulespecifyingappropriateaction.This problemisdifficult andthesubjectofcurrentresearch.
13.2.4 Practical Simplifications

The expected utility calculations allow plans to be evaluated in a more or less "realistic"manner.However, inorder tocompletethecalculationscertain proba bilitiesarenecessary, and many ofthese reflect detailed knowledgeabout thein teraction ofphenomenaintheworld.Itisthusoften impracticaltogoaboutafull blown treatment ofscoring in the style of Section 13.2.2. This section presents somepossiblesimplifications. Ofcourse,inmanyplanningproblems,suchasthosewhosecostsarenilorir relevant,orallofwhosegoalsareequallyvaluable,thereisnoneedtoaddressutil ityofplansatall. Suchplansaretypicallynotconcernedwithexpenditureofreal worldorplanningresources. Independence of various probabilities is one of the most helpful and per vasiveassumptionsinthecalculationofprobabilities.AnexampleappearedinSec tion 13.2.2withthetableandtelephoneshapedetectors. Certaininformation canbeignored.Garvey [Garvey 1976]ignoresfailurein formation. Hisplanningparametersincludethe"cost" ofanaction (strictlynega
452 Ch.13 GoalAchievement

tiveutilitiesreflecting effort), theprobabilityoftheaction "succeeding," and the conditional probabilitythatthestateoftheworldiscorrectlyindicated,givensuc cess. Related to ignoring some information is the assumption that certain out comes are more reliable than others. For instance, the decision not to plan past "failure"reportsmeansthattheyareassumedreliable. NonBayesian rules of inference abound in planners [Shortliffe 1976];the idea ofassigning asinglenumerical utility scoretoplansisbynomeanstheonly waytomakedecisions.
13.2.5 AVisionSystem Basedon Planning

Overview Thissectionoutlinessomefeatures ofaworkingvisionsystemwhoseactions are controlled by the planning paradigm [Garvey 1976].As with all large vision systems,moreissuesareaddressedinthisworkthanwiththeplanningparadigmas acontrolmechanism.Foronething,thesystemusesmultisensoryinput,including rangeandcolorinformation. Aninteractivefacility aidsindevelopingand testing lowlevel operators and "strategies" for object location. The machineusable representation ofknowledgeabouttheobjectsinthescenedomainsandhowthey couldbelocatedisofcourseacentralcomponent. Thedomainisoffice scenes (Fig.13.6).Forthetaskoflocatingdifferent ob jectsinsuchscenes,a"uniform strategy"isadopted.Thatis,thevisiontaskisal waysbroken downintoasequenceofmajor goalstobeperformed inorder.Such uniform strategies, if they are imposed on a system at all, tend to vary with different tasks,withdifferent sensorsordomain,orwithdifferent researchgoals. Garvey'suniformstrategyconsistsofthefollowingsteps. 1. Acquiresomepixelsthoughttobeinthedesiredregion (theareaofscenemak inguptheimageofthedesiredobject).

Fig. 13.6 The planning vision system uses input scenes such as these, imaged in different wavelengths and with a rangefinder. 453

Sec. 73.2 Planning with Costs

2. Verifytosomeconfidence thatindeedtheregionwasthedesiredone. 3. Boundtheregionaccurately. The outline the plangeneration, scoring, and execution used inthesystem are described inthe following paragraphs.The plansgenerated bythe system are typically enhanced versions of plans like the telephone finder. Plan scoring proceedsasexpectedforsuchplans;allowancesaremadefortheenhancedseman ticsofplannodes.A"cost/confidence" scoringfunction isused,andvariousprac ticalsimplificationsaremadethatdonotaffect theplanningparadigmitself. AnExamplePlanandItsExecution The system's plansare enhanced plans, in the sense ofSection 13.2.3.Ac tionscanbeAND, ORorSEQUENCEactions,andsharedplanstructureandloops arepermitted. Loopsthatcontain onlyinternal,planningactionswouldnever ter minate. However,aloopwithanORnodecanterminate (hasanexit)ifoneofthe subactionsoftheORisexecutable.Aplanforlocatingachairinanoffice sceneis shown inFig.13.7.InFig.13.7,theacquirevalidatebound strategyisevidentin thetwoSEQUENCEsubgoalsoftheFindChairmaingoal,whichisanANDgoal. The loop in the plan isevident, and makes sense here because often planningis doneforinformation gathering,notforrealworldactions. As noted in Section 13.2.3, an enhanced plan may not be completely specified. Ifitistobeexecuted onesubgoalatatime (noparallelism isallowed), sequences of subactions must be determined for its AND and OR actions. In Garvey'splanner, these sequences aredetermined initially onthe basisofapriori information, but the partial results of actions are "fed back," so that dynamic rescoring and hence dynamicreordering ofgoalsequences ispossible.For exam ple,ifonesubgoalofanANDactionfails,theANDactionisabandoned.Thusthis planneristosomedegreeincremental. In execution, Fig. 13.7 might result in the sequence ofactions depicted in Fig. 13.8.The acquisition phase of object location has the most alternatives, so plangenerationeffort ismainlyspentthere.Acquisitionproceedseitherdirectlyor indirectly.Directacquisitionistheclassification ofinputdatagatheredfrom aran dom sampling ofawindow in the image; the input data arerich enough toallow basicpatternrecognitiontechniquestoidentify thesourceofindividualpixels. Indirect acquisition is the use of the location of other "objects" (really identified regions) in the scene to locate the desired region.The desired region mightbefound by"scanning"verticallyorhorizontallyfrom thealready identified region, for instance. The idea isaplanning version of acommon one (e.g., the geometric location networks ofSection 10.3.2):usesomethingalready located to limitanddirectsearchforsomethingelse. PlanGeneration AplansuchasFig.13.7is"elaborated"from thebasicFindChairgoalbyre cursivelyexpandinggoals.Somegoals (suchastofindachair)arenotdirectlyexe cutable;theyneedfurther elaboration.Elaborationcontinuesuntilallthesubgoals are executable. Executable subgoals are those that analyze the image, run filters anddetectorsoverpartsofit,andgeneratedecisionsaboutthepresenceorabsence
454
Ch. 13 Coal Achievement

Fig. 13.7 An enhanced plan to locate a chair in an office scene. Untied multiple arcs denote OR actions, arcs tied together denote AND actions, those with *'s denote SE QUENCE actions. The loop in the plan has executable exits.

. V . >

(a)

(b)

(c)

(d)

Fig. 13.8 The plan of Fig. 13.7 finds the most promising execution sequence for finding the chair in the scene of Fig. 13.6: find the seat first, then scan upwards from the seat looking for the back. Acquisition of the seat proceeds by sampling (a), followed by classification (b). The Validation procedure eliminates nonchair points (c), and the Bounding procedure produces the seat region (d). To find the back, scanning proceeds in the manner indicated by (e) (actually fewer points are examined in each scan). The back is acquired and bounded, leading to the final location of the chair regions (f). 456 Ch.13 CoalAchievement

(e) Fig. 13.8 (com.)

(f)

ofimagephenomena.Thisstraightforward elaboration isakintomacroexpansion, and isnotaverysophisticated planningmechanism (the programcannot criticize and manipulate the plan, only score it). Afully elaborated plan ispresented for scoringandexecution. The elaboration process, or planner, has at its disposal several sorts of knowledge embodied asmodules thatcangenerate subgoals for agoal.Someare general (tofindsomething,findallitsparts);somearelessgeneral (achairhasa backandaseat);somearequite specific, beingperhapsprogramsarisingfrom an earlier interactivemethodgeneration phase.Theelaborator isguidedbyinforma tionstoredaboutobjects,forinstancethisaboutatabletop: OBJECT TableTOP PROPERTIES Hue:2658 Sat.:0.230.32 Bright.:1826 Height:2628 Orient.:77 RELATIONS SupportsTelephone0.6 SupportsBook0.4 OccludesWall1

Here the orientation information indicates a vertical surface normal. The planner knows that it hasamethod of locating horizontal surfaces, and the plan elaborator canthuscreateagoalofdirectacquisition byfirstlocatingahorizontal plane.Therelationalinformation allowsforindirectacquisitionplans.Theelabora torputsdirectandindirectalternativesunderan ORnodeintheplan. Information notusedforacquisition (height,color)maybeusedforvalidation. Loopsmayoccurinanelaborated planbecauseeachnewlygenerated goalis checked againstgoalsalready existing.Should itoranequivalent goalalreadyex ist, the existing goal issubstituted for the newly generated one.Goalsmay thus havemorethanoneancestor,andmaydependononeanother.
Sec. 13.2 Planning
with Costs

457

At thisstage,the planner doesnot useanyplanningparameters (cost, utili ties, etc.); itisstrictly symbolic. Asmentioned above, important information aboutexecutionsequencesinanenhancedplanisprovidedbyscoring. PlanScoringandExecution ThescoringinthevisionplanisaversionofthatexplainedinSections13.2.2 through 13.2.4.Eachaction inaplan isassumed either tosucceed (S)inlocating anobjectortofail.Eachactionmayreporteithersuccess("5"') orfailure. Anac tion isassumed toreport failure correctly, butpossiblytobeinerrorin reporting success.Eachactionhasthree "planningparameters"associated withit.Theyare C,its"cost" (inmachinecycles),PC'S") theprobability ofitreportingsuccess, andP(S\" 5 " ) ,theprobabilityofsuccessgivenareportofsuccess. Asshownearlier,theproduct P(S\"S")P("S") (13.19) isthe probability that the action hascorrectly located anobject and reported suc cess.Thisproductiscalledthe"confidence" oftheaction.Anactionhasstructure asshowninFig.13.9. Thescoreofanactioniscomputedas ^ (13.20) confidence Theplannerthusmustminimizethescore. The initial planning parameters ofan executable action typically are deter mined byexperimentation.Theparametersofinternal (AND,OR, SEQUENCE) actionsbyscoringmethodsalludedtoinSections 13.2.2,13.2.3,andtheExercises (thereareafewidiosyncraticadhocadjustments.). Itmaybearrepeatingthatplanning,scoring,andexecutionarenotseparated temporally inthis sytem. Scoring isused after the enhanced plan isgeneratedto deriveasimple plan (with ordered subgoals). Execution can affect the scoresof nodes, and soexecution canalternatewith "replanning" (reallyrescoring result ing inareordering). Recall theexample offailure ofanAND orSEQUENCE subgoal,whichcanimmediatelyfailtheentiregoal.Moregenerally,theentiregoal andultimatelytheplanmayberescored.Forinstance,theparametersofasuccess ful action aremodified bysetting thecost ofthe executed action to0and its confidence toitssecondparameter, P(S\"S"). Givenascored plan,execution istheneasy;theexecution programstartsat thetopgoaloftheplan,workingitswaydownthebestpathasdefinedbythescores of nodes itencounters. When anexecutable subgoal isfound (e.g. "look fora greenregion"),itispassedtoanevaluationfunction that"runs"theactionasso ciatedwiththesubgoal. The subgoal iseither achieved ornot;ineither case, information aboutits outcome ispropagated back upthe plan. Failure iseasy;afailed subgoal ofan AND orSEQUENCE goal fails the goal, and this failure ispropagated.Afailed subgoalofanORgoalisremovedfrom theplan.Theuseofsuccessinformation is more complex, involving theadjustment ofconfidences and planning parameters illustratedabove.
458
Ch. 13 Goal Achievement

score=

P ("success")

1P ("success")

Detector reports "success"

Detector reports "failure"

P(objectI"success"

\P (object|"success") Object not present

Object present

Correctly decideobject present

Incorrectly decide object present

Fig. 13.9 This is the microslructure ofa node ("action''') of Garvey's planning system in terms of simple plans. Think of actions as being object detectors which announce " F o u n d " or "Not Found." Garvey's planning parameters are PO'Found") and P(Object is there|"Found"). Confidence in the action is their product; it isthe probability ofcorrectly detecting the object. All other outcomes arelumped together and not used for planning.

After the outcome of agoal is used to adjust the parameters of other goals, the plan isrescored and another cycleof execution performed. The execution can useknowledge about the imagepicked upalong thewaybypriorexecution.Thisis how results (such asacquired pixels) are passed tolater processing stages (such as thevalidation process).Suchamechanism caneven beused toremember success ful subplansfor later use. EXERCISES 13.1 Complete thecomputation ofoutcome probabilities inthestyleofSection 13.2.2, using the assumptions given there. Check your work byshowing (symbolically) that theprobabilities ofgetting totheterminal actions ("goal states") oftheplan sumto1. 13.2 Assume in Section 13.2.2 that theresults of the "table" and "telephone shape" detectorsarenotindependent. Formulateyourassumptionsandcomputethenew outcomeprobabilitiesforFig.13.4.
Exercises

459

13.3 Showthat P(A PKA\KBI\C)) i(ni\C))P(B\(AAC))P(A\C) p(B]c) 13.4 Band Care independent ifPiB A C) = PiB) PiC). Assuming that Band Care independent,showthat P(B\C) = PiB) PiiBAC)\A) PiB\iAf\C)) 13.5 Startingfrom thefactthat PiAAB) = PiAABAO + PiAABAiO) showhowP\swascomputedinSection13.2.2. Vdetectors isused todetect an object; thedetectors either 13.6 Asequence DiN) of T succeed or fail. Detector outputs are assumed independent ofeach other, being conditionedonlyontheobject.Usingpreviousresults,showthattheprobabilityof anobject beingdetected byapplyingasequenceofNdetectorsDiN) isrecursively rewritable in terms of the output of the first detector D\ and the remaining se quenceDiN1)as p(0lD(N))P(DW)PiO\DiN^l)) 13.7 ConsiderscoringaplancontaininganORnode (action). Presumably,eachsubgoal oftheORhasanexpected utility. TheORactionisachievedassoonasoneofthe subgoalsisachieved.Isitpossible.toorderthesubgoalsfor trialsoasto maximize theexpected utilityoftheplan? (Thisamountstoaunique"best"rewritingofthe plantomakeitasimpleplan.) 13.8 Answerquestion 13.7foranANDnode;remember that theANDwillfailassoon asanyofitssubgoalsfails. 13.9 What canyou sayabout how the cost/confidence ratioofGarvey's planner isre latedtotheexpectedutilitycalculationsofSection13.2.2? 13.10 YouareatDandyDan'susedcarlot.ConsumerReportssaysthattheaprioriproba bilitythatanycaratDandyDan'sisalemonishigh.Youknow,though,thattotest acaryoukickitstire.Infact,withprobability: Pi"C" |C) : akickcorrectlyannounces"creampuff" whenthe caractuallyisacreampuff P("C"|L) :akickincorrectlyannounces"creampuff" when thecarisactuallyalemon PiL) : theaprioriprobabilitythatthecarisalemon Your plan for dealing with Dandy Dan isshown below;give expressions for the probabilitiesofarrivingat thenodeslabeledSi,S2,Fu F2, and F3. Give numeric answersusingthefollowingvalues Pi"C"\C) = 0.5, Pi"C"\L) = 0.5, PiL) = 0.75
4 6 0 Ch. 13 Coal Achievement

PiB\A)PiC\A)

= PiB\A)

Kick reports "creampuff"

Kick reports "lemon"

Kick reports "creampuff"

Chevy isa creampuff

Ex. 13.10

13.11 Two bunches of bananas are in a room with a monkey and a box. One of the bunches is lying on the floor, the other is hanging from the ceiling. One of the bunchesismadeofwax.Theboxmaybemadeofflimsycardboard.Giventhat: KWH) P(WL) P(C) Uieat) C(walk) C(push) C(climb) = 0.2:probabilitythatthehangingbananasarewax = 0.8:probabilitythatthelyingbananasarewax = 0.5:probabilitythattheboxiscardboard = 200:utilityofeatingabunchofbananas = 10:costofwalkingaunitdistance = 2 0 :costofpushingtheboxaunitdistance = 20:costofclimbinguponbox

(a) Analyze twodifferent plansfor themonkey, showingallpathsandcalcula tions. Give criteria (based upon extra information not given here) that wouldallowthemonkeytochoosebetweentheseplans.
Exercises

461

(b) Suppose themonkey knowsthattheprobabilitythattheboxwillcollapseis inversely proportional to the cost ofpushing the box a unit distance (and that he can sense this cost after pushing the box 1 unit distance). For example, P(C) P(C(push) P(C(push) P(C(push) = = = = 1.0 [C(push) x 0.01] 10)= 0.1 20)= 0.1 100)= 0.1

Repeatpart(a) (indetail).

REFERENCES
AMBLER,A. P., H. G. BARROW,C. M. BROWN,R. M. BURSTALL,and R. J. POPPLESTONE."A versatile system forcomputer controlled assembly." ArtificialIntelligence6,2, 1975, 129156. BALLARD, D. H. "Modeldirected detection of ribs in chest radiographs." TR11, Computer Science Dept., U.Rochester, March 1978. BOLLES, R. C. "Verification vision for programmable assembly." Proc, 5th IJCAI, August 1977, 569575. BUNDY, A.ArtificialIntelligence:An introductorycourse. NewYork:North Holland, 1978. DEGROOT, M.H. OptimalStatisticalDecisions.NewYork:McGrawHill, 1970. FAHLMAN, S. E. "A planning system for robot construction tasks." ArtificialIntelligence 5, 1, Spring 1974, 149. FELDMAN, J. A. and R. F. SPROULL. "Decision theory and artificial intelligence: II.The hungry mon key." CognitiveScience 1,2, 1977, 158192. FELDMAN, J. A.and Y. YAKIMOVSKY. "Decision theory and artificial intelligence: I.A semanticsbased region analyser." ArtificialIntelligenceJ, 4, 1974, 349371. FlKES, R. E. and N. J. NILSSON. "STRIPS: A new approach to the application of theorem proving to problem solving." ArtificialIntelligence2,3/4, 1971,189208. FIKES, R. E., P. E. HART, and N. J. NILSSON. "New Directions in robot problem solving." In MI7, 1972a. FIKES, R. E., P. E. HART, and N. J. NILSSON. "Learning and executing generalized robot plans." ArtificialIntelligenceJ, 4, 1972b,251288. GARVEY, J. D. "Perceptual strategies for purposive vision." Technical Note 117, AI Center, SRI Int'l, 1976. MACKWORTH, A. K. "Vision research strategy: Black magic, metaphors, mechanisms, miniworlds, and maps." In CVS, 1978. MCCARTHY, J. and P.J. HAYES. "Some philosophical problems from the standpoint ofartificial intelli gence."InMI4, 1969. NILSSON,N.J.PrinciplesofArtificialIntelligence.Palo Alto,CA:Tioga Publishing Company, 1980. RAIFFA, H.DecisionAnalysis. Reading, MA:AddisonWesley, 1968. SACERDOTI, E. D. "Planning in a hierarchy of abstraction spaces." Artificial Intelligence J, 2, 1974, 115135. SACERDOTI, E.D.AStructurefor PlansandBehavior.New York: Elsevier, 1977. SHIRAI, Y."Analyzing intensity arrays usingknowledge about scenes." In PCV, 1975. 462 Ch. 13 CoalAchievement

SHORTLIFFE, E. H. ComputerBased Medical Consultations: MYCIN. New York: American Elsevier, 1976. SPROULL, R. F. "Strategy construction usingasynthesis ofheuristic and decisiontheoretic methods." Ph.D. dissertation, Dept. ofComputer Science,Stanford U., May 1977. SUSSMAN,G.J.AComputerModel ofSkillAcquisition.New York: American Elsevier, 1975. TATE, A. "Generating project networks." Proc, 5th IJCAI, August 1977,888983. WARREN, D. H. D."WARPLAN: Asystem for generating plans." Memo 76, Dept. of Computational Logic, U.Edinburgh, June 1974.

References

463

Some MathematicalTools Appendix1


A1.1 COORDINATE SYSTEMS

A1.1.1 Cartesian

The familiar twoand threedimensional rectangular (Cartesian) coordinatesys temsarethe mostgenerally useful ones in describinggeometry for computervi sion.Mostcommonisarighthanded threedimensional system (Fig.ALL).The coordinates of apoint are the perpendicular projections of its location onto the coordinateaxes.Thetwodimensionalcoordinatesystemdividestwodimensional space into quadrants, the threedimensional system divides threespace into oc tants.
A1.1.2 PolarandPolarSpace

Coordinatesystemsthatmeasurelocationspartiallyintermsofanglesareinmany casesmorenaturalthanCartesiancoordinates. Forinstance,locationswithrespect

~T

Fig. A l . l

Cartesian coordinate systems. 465

tothepantiltheadofacameraorarobotarmmaymostnaturallybedescribedus ingangles.Twoandthreedimensional polarcoordinatesystemsareshowninFig. A1.2. CartesianCoordinates PolarCoordinates p cos9 x p sin0 2 2 A P bc +y )' tan
l

CartesianCoordinates PolarSpaceCoordinates (x,y, z) (p cos,p cos17,p cos) (x2+y2 + z2) P cos cos
1

cos 1

Inthesecoordinatesystems,theCartesianquadrantsoroctantsinwhichpointsfall areoften ofinterest becausemanytrigonometricfunctions determineonlyanan glemoduloIT12orTT (oneortwoquadrants) andmoreinformation isnecesseryto determine thequadrant. Familiar examplesarethe inverseanglefunctions (such asarctangent),whoseresultsareambiguousbetweentwoangles.
A1.1.3 SphericalandCylindrical

ThesphericalandcylindricalsystemsareshowninFig.A1.3.

Fig. A1.2 Polar and polar space coordinate systems.


4 6 6 App. 7 Some Mathematical Tools

Fig. A1.3 Spherical and cylindrical coordinate systems.

CartesianCoordinates SphericalCoordinates x p sin0 cos9. y p sin0 sin9=x tan0 z p cos0

tan"
cos
l

CartesianCoordinates CylindricalCoordinates x rcos9 y r sin9 (x2+y2)* tan l


A1.1.4 Homogeneous Coordinates

r
0

Homogeneous coordinates are avery useful tool in computer vision (and com puter graphics) because theyallowmany important geometric transformations to berepresenteduniformly andelegantly (seeSectionA1.7).Homogeneouscoordi natesareredundant:apointinCartesian spaceisrepresentedbyalineinhomo geneous (n + 1)space. Thus each (unique) Cartesian coordinate point correspondstoinfinitely manyhomogeneouscoordinates. CartesianCoordinates (x,y, z) 2. w
Sec.A1.1 Coordinate Systems

HomogeneousCoordinates (wx, wy,wz,w) (x,y, z, w)

4 6 7

Herex,y, z,and warerealnumbers, wx, wy,and wzare the productsofthetwo reals,andx/wandsoonaretheindicatedquotients.


A1.2. TRIGONOMETRY

A1.2.1 PlaneTrigonometry

Referring toFig.A1.4, define sine: cosine: sin (A) (sometimessinA) =


c

cos(A) (orcosA) = c tangent: tan (A) (ortanA) = 4 b The inverse functions arcsin, arccos,andarctan (alsowritten sin 1 , cos 1 , tan 1 ) mapavalueintoanangle. Therearemanyuseful trigonometricidentities;someof themostcommonarethefollowing. . / > sin6c) = t ( \ . tan \x) = TT t a n ( x ) cos U) sin (x +y) sin (x) cos (y) +cos (x) sin (y) cos (x +y) =cos (x) cos (y) sin (x) sin (y) tan (x) + tan (y) tan (x y) 1 + tan (x) tan(y) InanytrianglewithanglesA,B,Coppositesidesa,b,c,theLawofSinesholds: a sin A asdoestheLawofCosines: a2 = b2 +c22bc cosA a = b cos C + ccosB b sin B c sin C

C "
4 6 8

Fig. A1.4 Planerighttriangle.


App. 7 Some Mathematical Tools

A1.2.2. Spherical Trigonometry

The sidesofaspherical triangle (Fig.A1.5) aremeasured bythe angle theysub tend atthe sphere center; itsangles bytheanglethey subtend on thefaceof the sphere. Someusefulsphericaltrigonometricidentitiesarethefollowing. sinA sin B _ sinC sin a sin b sinc cosb cos(c 9) cosa = cosb cose + smb sine cos,4 =
COS0

Where tan9=tanb cosA, cosA = cosB cosC + sinB sinC cosa


A1.3. VECTORS

Vectorsarebothanotationalconvenienceandarepresentationofageometriccon cept.Thefamiliar interpretation ofavectorvasadirectedlinesegmentallowsfora geometrical interpretation of many useful vector operations and properties. A moregeneral notionofan ^dimensional vectorv= (vl7 v2, . . . . v/;) isthatofan /7tupleabidingbymathematical lawsofcompositionandtransformation. Avector maybewrittenhorizontally (arowvector)orvertically (acolumnvector). Apointinspaceischaracterizedbyitsncoordinates,whichareoften writ ten asavector. ApointatX, Y,Zcoordinatesx,y, andziswrittenasavector x whose three components are (x,y, z). Such a vector may be visualized as a directed line segment, or arrow, with its tail at the origin of coordinates and its headatthepointat (x,y, z).Thesamevectormayrepresentinsteadthedirection inwhich it pointstoward the point (x,y, z)starting from the origin.An impor tant typeofdirection vector isthe normal vector, which isavector inadirection perpendicular toasurface,plane,orline. Vectorsofequaldimensionareequaliftheyareequalcomponentwise.Vec torsmaybemultiplied byscalars.Thiscorresponds tostretching orshrinking the vectorarrowalongitsoriginaldirection.
Xx = (kX\, kx2, ..., A.x/7)

Fig. A1.5 Sphericaltriangle.


Sec.AT.3 Vectors

469

Vectoraddition and subtraction isdenned componentwise, onlybetween vectors ofequal dimension. Geometrically, toadd twovectors xandy,puty'stailat x's headandthesumisthevectorfrom x's tailtoy'shead. Tosubtractyfrom x,put y'sheadatx'shead;thedifference isthevectorfrom x'stailtoy'stail. x y= Gci yh x2 y2, ..., xn y) Thelength (ormagnitude) ofavectoriscomputedbyan^dimensionalversionof Euclideandistance. |x|= (x,2 +x 2 2 + +xn2) A vector ofunitlength isaunitvector.TheunitvectorsinthethreeusualCarte siancoordinatedirectionshavespecialnames. i= (1,0,0) j= (0, 1,0)
k = (0,0, 1)

Theinner (orscalar,ordot)productoftwovectorsisdefinedasfollows. x y= |x||y|cos0 xlyl +x2y2 + +xy Here 9 is the angle between the two vectors. The dot product of two nonzero numbersis0ifandonlyiftheyareorthogonal (perpendicular).Theprojectionofx ontoy(thecomponentofvectorxinthedirectiony)is
|x|cos0 = j r
i i

x y

Otheridentitiesofinterest: x y= y x x ( y + z) = x y + x z A(xy) = (Xx) y = x Uy) x x= |x| 2 Thecross (orvector) product oftwothreedimensional vectorsisdefined as follows. x x y= (x2y3 x3y2, x3y{ x ^ 3 , X\y2 x2y{) Generally, thecrossproduct ofxandyisavector perpendicular toboth xandy. Themagnitudeofthecrossproduct dependsontheangle0between thetwovec tors. |x x y|= |x||y|sin0 Thus the magnitude of the product iszero for twononzero vectors ifand onlyif theyareparallel. Vectorsandmatricesallowfortheshortformalexpressionofmanysymbolic
4 7 0 App. 7 Some Mathematical Tools

expressions. One such example isthe formal determinant (Section A1.4) which expressesthedefinition ofthecrossproductgivenaboveinamoreeasilyremem beredform. i x x y= det
* i

j kj
x2 x3
yi
y?>

y\ Also,
x

Xy= y X x
Z

X X ( y z ) = x X y X X i x j = k j x k = i k x i= j

X(x x y) = Xxx y= x xXy

The triplescalarproduct isx (y x z),and isequivalent tothevalueofthe determinant


X] x2 det y\ yi Z) * 3 ^ 3
Z3

*2

Thetriplevectorproductis x x (y x z) = (x z)y (x y)z

A1.4. MATRICES

AmatrixAisatwodimensional arrayofelements;ifithasmrowsandncolumns itisofdimension mx ,and the element inthe /th rowandy'thcolumn maybe named ay.If mor n = 1,arow matrix orcolumn matrix results, which is often called avector. There is considerable punning among scalar, vector and matrix representationsandoperationswhenthesamedimensionalityisinvolved (the 1 x 1matrixmaysometimesbetreatedasascalar,forinstance).Usually,thispractice isharmless,butoccasionallythedifference isimportant. Amatrix issometimes mostnaturally treated asacollection ofvectors,and sometimesanmx nmatrixMiswrittenas M = [ai a2 a]
Sec.ATA Matrices 4 7 1

or b, b2 M =

wherethea'sarecolumnvectorsandtheb'sarerowvectors. TwomatricesAand Bareequaliftheirdimensionality isthesameandthey areequaleiementwise.Likeavector,amatrixmaybemultiplied (eiementwise)by ascalar. Matrixaddition and subtraction proceeds eiementwisebetween matrices oflikedimensionality.ForascalarA :andmatricesA, B,and Coflikedimensional itythefollowingistrue. A = B C
if aij= bij cL

1 < / < m,

Twomatrices A and Bare conformable for multiplication ifthe number of columnsofAequalsthenumberofrowsofB.Theproductisdefinedas C =AB whereanelement cu isdefinedby cu =
k

fl

*^y

Thus each element of Cis computed as an inner product of a row ofA witha column ofB.Matrix multiplication isassociativebutnotcommutative ingeneral. The multiplicativeidentity inmatrixalgebraiscalled theidentitymatrix /./isall zerosexceptthatallelementsinitsmaindiagonalhavevalue1(ay= 1 if/=y,else atj =0).Sometimesthen x nidentitymatrixiswritten/. The transpose ofan m x nmatrix Aisthe n x mmatrix ATsuch that the ijth elementofAisthey',/thelementofAT.IfAT=A,Ais symmetric. Theinversematrixofann x nmatrix,4iswritten^" 1 .Ifitexists,then AA~l =A~lA = / ?matrixiscalledsingular. Ifitsinversedoesnotexist,ann x A With k and p scalars, and A, B, and C m x nmatrices, the following are somelawsofmatrixalgebra(operationsarematrixoperations): A +B =B +A (A +B) + C =A + (B + C) k(A +B) = kA + kB (k +p)A =kA +pA AB j* BA in general (AB)C = A(BC) A(B + C) = AB +AC (A +B)C =AC + BC
4 7 2 App. 7 Some Mathematical Tools

AikB) =k(AB) = (kA)B


lmA = Aln = A

{A +B1) =AT+ BT (AB)T= BTAT (AB)~l = B]A~] The determinant ofan n x nmatrix isanimportant quantity; among other things,amatrixwithzerodeterminantissingular. LetAybethe(n1)x (1) matrixresultingfromdeletingthe/throwandythcolumnfromann x nmatrixA. Thedeterminantofa1 x 1 matrixisthevalueofitssingleelement.Forn > 1,

detA=Jt
i=\

a.j (D /+y deti4 tf

for anyj between 1and n.Given the definition ofdeterminant, the inverse ofa matrixmaybedefinedas
K ,,J

( l ) ^ d e t ^ , detA

In practice, matrix inversion may beadifficult computational problem, but this important algorithm has received much attention, and robust and efficient methods exist in the literature, many ofwhich mayalso beused tocompute the determinant. Many of the matrices arising in computer vision have to do with geometric transformations, and havewellbehaved inverses corresponding to the inversetransformations.Matricesofsmalldimensionalityareusuallyquitecompu tationallytractable. Matricesareoften used todenote linear transformations; ifarow (column) matrixXofdimension nispost (pre)multipliedbyann x nmatrixA,theresultX' = XA (X' = AX) isanother row (column) matrix, each of whose elements isa linearcombination oftheelementsofX,theweightsbeingsupplied bythevalues ofA. Byemploying thecommon pun betweenrowmatricesandvectors,x' = xA (x' = Ax)isoftenwrittenforalineartransformation ofavectorx. Aneigenvectorofann x nmatrixAisavectorvsuchthatforsomescalarX (calledaneigenvalue), \A = \ v Thatis,thelineartransformation Aoperatesonvjustasascalingoperation.Ama trixhasneigenvalues,butingeneraltheymaybecomplexandofrepeatedvalues. Thecomputation ofeigenvaluesandeigenvectorsofmatricesisanother computa tionalproblemofmajor importance,withgoodalgorithmsforgeneralmatricesbe ingcomplicated.Theneigenvaluesarerootsofthesocalledcharacteristicpolyno mialresultingfrom settingaformaldeterminanttozero: det (A kl) =0.

Sec.ATA

Matrices

4 7 3

Eigenvalues ofmatrices upto 4 x 4may befound inclosed form bysolving the characteristic equation exactly. Often, the matrices whose eigenvalues are of in terestaresymmetric,and luckilyinthiscasetheeigenvaluesareallreal.Manyal gorithmsexistintheliteraturewhichcomputeeigenvaluesandeigenvectorsboth forsymmetricandgeneralmatrices.
.5. LINES

Aninfinite linemayberepresented byseveral methods,eachwithitsownadvan tagesand limitations.Anexampleofarepresentation whichisnotoften veryuse ful istwo planes that intersect to form the line.The representations below have provengenerally useful.
A1.5.1 TwoPoints

Atwodimensional orthreedimensional line (throughout Appendix 1thisshort handisusedfor "lineintwospace"and "lineinthreespace";similarlyfor "two (three) dimensional point") isdetermined bytwopoints on it, xl and x2.This representation canserve aswell for ahalfline oralinesegment.The twopoints canbekeptastherowsofa(2xn)matrix.
A1.5.2 Pointand Direction

A twodimensional or threedimensional line (or halfline) isdetermined bya pointxonit(itsendpoint) andadirection vectorvalongit.Thisrepresentation is essentiallythesameasthatofSectionA1.5.1,buttheinterpretation ofthevectors isdifferent.
A1.5.3 SlopeandIntercept

Atwodimensional linecanoften berepresented bythe Yvalue bwherethe line intersectsthe Kaxis,andtheslopemoftheline (thetangentofitsinclinationwith thexaxis). This representation fails for vertical lines (those with infinite slope). Therepresentationisintheformofanequation makingexplicitthedependenceof vonx: y = mx +b AsimilarrepresentationmayofcoursebebasedontheXintercept.
A1.5.4 Ratios

Atwodimensionalorthreedimensionallinemayberepresentedasanequationof ratiosarisingfromtwopointsxl = {x\,y\, z\) andx2 = (x2, yi, z2)ontheline.


x X] ^ y y\
=

z zi

x2x\

yiy\

zi~z\
Ann. 7 Some Mathematical Tools

A1.5.5 NormalandDistancefrom Origin (Line Equation)

Thisrepresentation fortwodimensional linesiselegantinthatitspartshave useful geometric significance which extends to planes (not to threedimensional lines). The coefficients of the general twodimensional linear equation represent a two dimensional line and incidentally give its normal (perpendicular) vector and its (perpendicular) distancefrom theorigin (Fig.A1.6). From the ratio representation above, itiseasy toderive (intwo dimensions) that (x x{) sin 0 (y yY) cos 9 = 0 so for d = (x\ sin 9 y1 cos 9), x sin 9 y cos 9 4d = 0 This equation has the form of a dot product with a formal homogeneous vector (x,y,\): (x,y, 1) (sinf?, c o s 0 , d) = 0 Here the twodimensional vector (sin9, cos9) isperpendicular totheline (itisa unit normal vector, infact), and dis thesigned distanceinthedirection ofthe nor mal vector from the line to the origin. Multiplying both sidesof the equation bya constant leaves the line invariant, but destroys the interpretation of das the dis tancetotheorigin. This form oflinerepresentation hasseveral advantages besides the interpre tationsofitsparameters.The parameters never go toinfinity (thisisuseful in the Hough algorithm described inChapter 4). Therepresentation extends naturally to representing ^dimensional planes. Leastsquared error line fitting (Section A1.9) with this form of line equation (as opposed to slopeintercept) minimizes errors perpendicular to the line (asopposed to those perpendicular to one of the coordi nateaxes).

Fig. A1.6 Twodimensional linewith normal vectorand distancetoorigin.

Lines

475

A1.5.6 Parametric

Itissometimesuseful tobeablemathematically to"walkalong"alinebyvarying someparameter t.The basicparametricrepresentation herefollowsfrom thetwo pointrepresentation. Ifxl and x2aretwoparticular pointsontheline,ageneral pointonthelinemaybewrittenas = xl + K x 2 x l ) Inmatrixtermsthisis x=[t \)L where Listhe 2x nmatrixwhosefirst rowis(x2 xl) andwhosesecond isxl. Parametricrepresentationsbasedonpointsonthelinesmaybetransformed bythe geometricpointtransformations (SectionA1.7).

A1.6. PLANES

The mostcommon representation ofplanesistousethecoordinates oftheplane equation. This representation isan extension ofthe lineequation representation ofSectionAl.5.5. Theplaneequationmaybewritten ax +by+cz+d = 0 which is in the form of a dot product x p= 0. Four numbers given by p= {a,b,c,d) characterizeaplane,andanyhomogeneouspointx= {x,y, z, w) satisfying the foregoing equation liesin the plane.In p, thefirstthree numbers {a,b,c) form anormal vector totheplane.Ifthisnormal vector ismadetobea unitvector byscalingp,thendisthesigneddistancetotheoriginfrom theplane. Thus thedot product oftheplanecoefficient vectorandanypoint (inhomogene ouscoordinates) givesthedistanceofthepointtotheplane(Fig.A1.7).

+- x Fig. A1.7 Distance from apoint toaplane.

476

App. 1 Some Mathematical Tools

Threenoncollinearpointsxl,x2,x3determineaplanep. Tofindit,write xl 0] x2 0 p= 0 x3 1 0 0 0 1 Ifthe matrix containing the point vectorscanbeinverted, thedesired vectorpis thusproportionaltothefourthcolumnoftheinverse. Threeplanespi,p2,p3mayintersectinapointx.Tofindit,write pi p2 p3 0 0 0 = [ 0 0 0 1 ] 1 Ifthe matrix containing the planevectors canbeinverted, the desired point pis givenbythefourth rowoftheinverse.Iftheplanesdonotintersectinapoint, the inversedoesnotexist.

A1.7 GEOMETRIC TRANSFORMATIONS

Thissectioncontainssomeresults thatarewellknown through theircentralplace inthecomputergraphicsliterature,andillustratedingreaterdetailthere.Theidea isto usehomogeneous coordinates toallowthewriting ofimportant transforma tions (including affine and projective) aslinear transformations. The transforma tionsofinterestheremappointsorpointsetsontootherpointsorpointsets.They include rotation, scaling, skewing, translation, and perspective distortion (point projection) (Fig.A1.8). A point x in threespace is written as the homogeneous row fourvector (x,y, z, w), and postmultiplication by the following transformation matricesac complishes point transformation. A set of m points may be represented as an m x 4matrix of row point vectors, and the matrix multiplication transforms all pointsatonce.
A1.7.1 Rotation

Rotation ismeasured clockwiseaboutthenamedaxiswhilelookingalongtheaxis towardtheorigin. Rotationby9aboutthe^axis: 1 0 0 0 cos9 sin 9 0 sinfl cos0 0 0 0 0 0 0 1


477

Sec.A1.7

Geometric

Transformations

(a)

(b)

(c)

(dj

(e)

(f)

Fig. A1.8 Transformations: (a) original, (b) rotation, (c) scaling, (d) skewing, (e) translation, and (f) perspective.

Rotationby9aboutthe Faxis: cos9 0 sin9 0 Rotationby9abouttheZaxis:


cos 9 s i n 0 0 0 sin 9 cos 9 0 0

0 sin9 0 1 0 0 0 cos9 0 0 0 1

0 0
A1.7.2 Scaling

0 0

1 0 0 1

Scaling is stretching points out along the coordinate directions. Scaling can transform acubetoanarbitraryrectangular parallelepiped. ScalebySx, Sy, andSz intheX,Y,andZdirections:

k 0 o ol
0 0 Sy 0 0 Sz 0

0
4 7 8

i
App. 7 Some Mathematical Tools

A1.7.3 Skewing

Skewingisalinearchangeinthecoordinatesofapointbasedoncertainofitsother coordinates.Skewingcantransformasquareintoaparallelograminasimplecase: [ 1 0 0 0 d 1 0 0 0 0 1 0
0 0 0 1

Ingeneral,skewingisquitepowerful: 1 d e 0 k 1 m 0 n p 1 0 0] 0 0 1

Rotationisacompositionofscalingandskewing(SectionAl.7.7). A1.7.4 Translation Translateapointby(t, u,v):


fl 0 0 0]

0 1 0 0 0 0 1 0 t u v 1 WithathreedimensionalCartesianpointrepresentation,thistransformation isac complishedthroughvectoraddition,notmatrixmultiplication.


A1.7.5 Perspective

The properties of point projection, which model perspective distortion, were derivedinChapter2.Inthisformulation theviewpointisonthepositiveZaxisat (0,0,/, 1)lookingtowardtheorigin:facts likea"focallength".Thevisibleworld isprojectedthroughtheviewpointontotheZ=0imageplane(Fig.A1.9).
Y

\ \U

X
Sec.A1.7 Geometric Transformations

Fig. A1.9 Geometryofimageformation. 479

Similar triangles arguments show that the image plane point for any world point (x,y, z)isgivenby (U, V) =

fie fy fz'fz

Using homogeneous coordinates, a "perspective distortion" transformation can bewrittenwhichdistortsthreedimensionalspacesothatafterorthographicprojec tionontotheimageplane,theresultlookslikethatrequiredabovefor perspective distortion.Roughly, thetransformation shrinksthesizeofthingsastheygetmore distantinZ.Although thetransformation isofcourselinearinhomogeneouscoor dinates,thefinalstepofchangingtoCartesiancoordinatesbydividingthroughby thefourthvectorelementaccomplishesthenonlinearshrinkingnecessary. Perspectivedistortion (situationofFig.A1.9): 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1

Perspective from ageneral viewpoint has nonzero elements in the entire fourth column, butthisisjustequivalent toarotated coordinatesystemandtheperspec tivedistortionabove(SectionA1.7).
A1.7.6 TransformingLinesandPlanes

Line and plane equations may be operated on by linear transformations, just as pointscan.Pointbasedparametricrepresentationsoflinesandplanestransformas do points, but the line and plane equation representations actdifferently. They haveanelegantrelationtothepointtransformation. IfT\satransformation matrix ( 3 x 3 for two dimensions, 4 x 4 for three dimensions) asdefined in Sections Al.7.1toAl.7.5,thenapointrepresentedasarowvectoristransformedas x'= xT and the linear equation (lineorplane) when represented asacolumn vector vis transformedby

v'= r'v
A1.7.7 Summary

The4 x 4 matrixformulation isawaytounify therepresentationandcalculationof useful geometric transformations, rigid (rotation and translation), and nonrigid

480

App. 7 Some Mathematical Tools

(scaling and skewing), including the projective. The semantics ofthe matrix are summarizedinFig.A1.10. Sincetheresultsofapplyingatransformation toarowvectorisanotherrow vector, transformations may be concatenated byrepeated matrix multiplication. Suchcompositionoftransformations followstherulesofmatrixalgebra (itisasso ciativebutnotcommutative,forinstance).Thesemanticsof x'= xABC isthatx'isthevectorresultingfrom applyingtransformation Atox,thenBtothe transformed x,then Ctothetwicetransformed x.Thesingle4 x 4 matrixD = ABC would do thesamejob.The inverses ofgeometric transformation matrices are just the matrices expressing the inverse transformations, and are easy to derive.
A1.8. CAMERA CALIBRATIONAND INVERSEPERSPECTIVE

Theaimofthissectionistoexplorethecorrespondencebetweenworldandimage points.A(half) lineofsightintheworldcorrespondstoeachimagepoint.Camera calibration permitsprediction ofwhereintheimageaworldpointwillappear. In verse perspectivetransformation determines thelineofsightcorresponding toan imagepoint.Givenaninverseperspectivetransformandtheknowledgethatavisi ble point lies on aparticular world plane (say the floor, or in a planar beam of light), thenitsprecisethreedimensional coordinatesmaybefound, sincetheline ofsightgenerallyintersectstheworldplaneinjustonepoint.

Scalein X

Skew

Scalein Y

Perspective

Skew

Scalein Z

Translate

Zoom

Fig. Al.lO The4x 4 homogeneous transformation matrix.

Sec.AT.8

Camera Calibration and Inverse Perspective

4 8 1

A1.8.1 Camera Calibration

Thissectionisconcernedwiththe"cameramodel";themodeltakestheformofa 4 x 3 matrix mapping threedimensional world points totwodimensional image points.Therearemanywaystoderiveacameramodel.Theonegivenhereiseasy tostatemathematically;inpractice,amoregeneraloptimization techniquesuchas hillclimbing can bemost effective infindingthecamera parameters, sinceitcan takeadvantageofanythatarealreadyknownandcanreflectdependenciesbetween them. Lettheimageplanecoordinatesbe f/and V\inhomogeneouscoordinatesan imageplanepointis(u,v,t).Thus
/ *

t Call the desired camera model matrix C, with elements Cy and column four vectorsCj.Thenforanyworldpoint (x,y, z) aCisneededsuchthat (x,y, z,l)C= So u = (x,y, z, \)C\ v= (x,y, z,1)C2 t= (x,y,z, 1)C3 Expandingtheinnerproductsandrewritingu Ut=0and v Vt= 0, xCu +yCn +zC3l + CA\ UxCu UyC2i UzC33 UC43 =0 xCn +yC22 + zC32 + C42 VxCn VyC2i ~ VzCi3 KC43 0 (u, v,t)

TheoverallscalingofCisirrelevant, thankstothe homogeneous formulation, so C43maybearbitrarilysetto 1.Thenequationssuchasthoseabovecanbewritten inmatrixform:

x1 y
0 x2 0 y2

z1 1
0 z2 0 1

0 x 1

y z

Vx

l ]

Uxy' V]y]

Ulzl Vxzx
C21

ul\

0
4 8 2

V"x"

yiiyii

_ ynzn

C34

U" V"

App. 7 Some Mathematical Tools

ElevensuchequationsallowasolutionforC.Twoequationsresultforevery association ofan (x,y,z)pointwitha(U, V)point.Suchanassociation mustbe established usingvisibleobjectsofknownlocation (often placedforthepurpose). Ifmorethan 5xh such observationsareused,aleastsquarederror solutiontothe overdetermined system maybeobtained byusing apseudoinverse tosolvethe resultingmatrixequation (SectionA1.9).
A1.8.2 Inverse Perspective

Finding theworld linecorresponding toanimagepoint reliesonthefact thatthe perspective transformation matrix also affects thezcomponent ofaworld point. Thisinformation islostwhen thezcomponentisprojected awayorthographically, butitencodestherelation betweenthefocalpointandthezpositionofthepoint. Varyingthisthirdcomponentreferences pointswhoseworldpositionsvaryinzbut which project ontothesamepositionintheimage.Thelinecanbeparameterized byavariablepthat formally occupies theposition ofthat zcoordinate inthree spacethathasnophysicalmeaninginimaging. Writetheinverseperspectivetransform P~]as (x',y',p,\)p{= {x',y',pt 1+ j;)

Rewritingthisintheusualwaygivestheserelationsbetweenthe(x,y,z)pointson theline.
6c, y, z,1)=

fx'

fy'

fp'

f+p' f +p' f +p'

Eliminatingthe parameterp betweentheexpressionsforzandx andthoseforz thepa and>leaves

Thusx,y,andzarelinearlyrelated;asexpected,allpointsontheinverseperspec tivetransform ofanimagepointlieinaline,andunsurprisinglyboththeviewpoint (0,0,/ ) andtheimagepoint(x\y',0)lieonit. Acameramatrix Cdeterminesthethreedimensional linethatistheinverse perspective transform ofany imagepoint.Scale Cso that C43 = 1,andletworld pointsbewrittenx= (x,y,z,1)andimagepointsu= (u,v, t).Theactualimage pointsarethen v+ Vt u=u V H , sou = t' / Since u =xC, u = Ut= xCx v= Vt= x2 t = xC 3
Sec.AT.8 Camera Calibration andInverse Perspective 483

Substitutingtheexpressionfor rintothatforwand vgives UxC3 xCi V\C3 = xC 2 whichmaybewritten x ( d UC3)= 0 x(C 2 VC3)=0 Thesetwoequationsareintheform ofplaneequations.Forany U,Kintheimage andcameramodel C,therearedeterminedtwoplaneswhoseintersectiongivesthe desiredline.Writingtheplaneequationsas d\X +b\y +c\z +d\ =0 a2x +by + c2z +d2=0 then ax = Cu C\3U a2= C,2 CX3V andsoon.Thedirection (X,/x,v) oftheintersection oftwoplanesisgivenbythe crossproductoftheirnormalvectors,whichmaynowbewrittenas (X,ix,v) = (ah b\, C]) x (a2, b2, c2) (b\C2b2C\,C\a2c2a\, a\b2a2b\) Thenifv^ 0,foranyparticularz0,
x

o =

b\(c2z0 +d2) b2(cjZQ dx)

z:
a\b2 b\a2
a

2 ( c i * 0 + ^l) ~ Ql (^2^0 ~ d2)

axb2 bxa2 andthelinemaybewritten x ~ x0 _ y J / Q _ z z0 _


X
fJL

A1.9. LEASTSQUAREDERROR FITTING

The problem offitting asimplefunctional model toaset ofdatapointsisacom mon one, and isthe concern of thissection.The subproblem offitting astraight linetoasetof(x,y)points("linear regression") isthefirsttopic. Incomputervi sion, this linefitting problem is encountered relatively often. Modelfitting methods try tofindthe "best" fit; that is, they minimize some error. Methods which yield closedform, analytical solutions for such bestfitsareat issue here.
4 8 4 App. 1 Some Mathematical Tools

Therelevant "error" tominimize isdetermined partlybyassumptionsofdepen dencebetween variables. IfA:isindependent, thelinemayberepresented as.y= nvc+ 6andtheerrordefined astheverticaldisplacement ofapointfrom theline. Symmetrically, ifxls dependent, horizontalerrorshouldbeminimized.Ifneither variableisdependent,areasonableerrortominimizeistheperpendiculardistance from pointstotheline.Inthiscasethelineequationax+ by+ 1= 0 canbeused withthemethod shown here,ortheeigenvector approach ofSection Al.9.2may beused.
A1.9.1 PseudoInverse Method

Infittingan/? x 1 observationsmatrixybysomelinearmodelof/?parameters,the predictionisthatthelinearmodelwillapproximatetheactualdata.Then Y = XB +E whereXisann x pformal independent variable matrix, Bisap x 1parameter matrix whose values are to be determined, and E represents the difference betweenthepredictionandtheactuality:itisann x 1 errormatrix. Forexample,tofitastraight liney = mx+ btosomedata (*,, y,) points, form Yasacolumnmatrixofthe yh [l * i 1 x2
l x

x=

>

Mil
Nowthetaskistofind theparameter B(above,thebandmthat determine the straight line) that minimizes the error. The error is the sum of squared difference from theprediction,orthesumoftheelementsofEsquared,orETE(if wedonotmindconflating theoneelement matrixwithascalar).Themathemati cally attractive properties of the squarederror definition are almost universally takentocompensateforwhateverdisadvantagesithasoverwhatisreallymeantby error (theabsolutevalueismuchhardertocalculatewith,forexample). Tominimizetheerror,simplydifferentiate itwithrespecttotheelementsof Bandset the derivative to0.Thesecond derivative ispositive: this isindeeda minimum.Theseelementwisederivativesarewrittenterselyinmatrixform. First rewritetheerrorterms: ETE= (Y XB)T(Y XB)

= YTY BTXTY YTXB + BTXTXB = YTY 2BTXTY + BTXTXB


Sec.A1.9 LeastSquaredError Fitting

485

(here, the combined terms were 1x 1matrices.) Now differentiate: setting the derivativeto0yields 0= XTXB XTY andthus B= (XTX)lXTY= XfY whereA'''iscalledthepseudoinverseofX. Thepseudoinverse methodgeneralizestofittinganyparametrized modelto data (SectionA1.9.3).Themodelshould bechosenwithsomecare.Forexample, Fig. A1.11 showsadisturbing case inwhich the model above (minimize vertical errors) isusedtofitarelativelyverticalswarmofpoints. The"best fit" lineinthis caseisnottheintuitiveone.
A1.9.2 PrincipalAxis Method

The principalaxesandmomentsofaswarmofpointsdeterminethedirectionand amount of its dispersion in space. These concepts are familiar in physics as the principal axesand moments ofinertia.Ifaswarmof (possiblyweighted) pointsis translatedsothatitscenterofmass (averagelocation) isattheorigin,asymmetric matrix M maybeeasilycalculated whoseeigenvectors determine thebestfit line orplane in aleastsquaredperpendicularerror sense,and whoseeigenvalues tell howgoodtheresultingfitis. Givenaset {x'jrowofvectorswithweightsw1, define their"scatter matrix" tobethesymmetricmatrixM,wherex'= Gcj,x'2> x'?,):

M= x'V
Mkp=Y,44
i

1< k,p <3

Define thedispersion ofthe x'inadirection v (i.e., "dispersion around the planewhosenormal isv") tobethesum ofweighted squared lengthsofthex'in thedirectionv.ThissquarederrorE2is

E2= w (x'v)2 v ( wixiTXi)yT= vMv r

486

Fig. A1.I1 A sel of points and the "best fit" line minimizing error in Y.
App. 7 Some Mathematical Tools

To find the direction of minimum dispersion (the normal to the bestfit line or plane), note that the minimum ofvMv r over all unit vectorsvisthe minimum eigenvalueX\ofM. Ifvj isthecorresponding eigenvector, the minimum disper sionisattained atv = vj. Thebestfitlineorplaneofthepointsgoesthrough the center of mass, which isat the origin; inverting the translation that brought the centroidtotheoriginyieldsthebestfitlineorplanefortheoriginalpointswarm. Theeigenvectorscorrespondtodispersionsinorthogonaldirections,andthe eigenvalues tell how much dispersion there is. Thus with a threedimensional point swarm, two large eigenvalues and one small one indicate aplanar swarm whose normal is the smallest eigenvector. Two small eigenvalues and one large oneindicatealineinthedirectionofthenormaltothe"worstfitplane",oreigen vectoroflargesteigenvalue. (Itcanbeprovedthatinfactthisisthebestfitlineina least squared perpendicular error sense). Three equal eigenvalues indicate a "spherical"swarm. A1.9.3 FittingCurvesbythePseudoInverseMethod Givenafunctionfix) whosevalueisknownonnpointsXi,. . . , x,itmaybeuse ful istofititwithafunction g(x) ofmparameters (b\, ..., bm). Ifthesquareder roratapointx,isdefinedas

(e,) 2

\f(x,)gU,)]2

asequenceofstepssimilartothatofSectionA1.9.1 leadstosettingaderivativeto zeroandobtaining 0= GTGb GTf wherebisthevectorofparameters,fthevectorofnvaluesof/ (x),and

Bgixi)
dbi

db: Bg(x) db

G = 9g _

eb

Asbefore,thisyields b = (GTG)~l GTf Explicitleastsquaressolutionsforcurvescanhavenonintuitivebehavior.In particular,saythatageneralcircleisrepresented gtx,y) = x2 +y2 +IDx +2Ey +F thisyieldsvaluesofA E,and.Fwhichminimize

e2= Htx,, y;)2


Sec.A1.9 LeastSquaredError Fitting

487

for ninput points.The error term being minimized does not turn out to accord with our intuitiveone. Itgivesthe intuitive distance ofapoint to thecurve, but weighted byafactor roughly proportional totheradiusofthecurve (probably not desirable). The bestfitcriterion thus favors curves with high average curvature, resulting in smaller circles than expected. In fitting ellipses, this error criterion favorsmoreeccentricones. The most successful conicfitters abandon the luxury ofaclosedform solu tionandgotoiterativeminimization techniques,inwhichtheerrormeasureisad justedtocompensatefortheunwantedweighting,asfollows.

2= E
1=1

f(xhy,) IV/(*,,.y,)

A1.10 CONICS

Theconicsectionsareuseful becausetheyprovideclosedtwodimensionalcurves, theyoccurinmanyimages,and theyarewellbehaved andfamiliar polynomialsof lowdegree.Thissectiongivestheirequationsinstandardform, illustrateshowthe general conicequation maybeput intostandard form, and presents somesample specificresultsforcharacterizingellipses. All the standard form coniesmay besubjected to rotation, translation, and scalingtomove them around on the plane.These operationson pointsaffect the conicequationinapredictableway. Circle:r=radius xl +yl = r x 2 v2 2 a b2 y2 =Apx Parabola:(p,0) = focus,p= directrix
* '

Ellipse:a,b major,minoraxes

Hyperbola:vertices(a, 0),asymptotes^= Thegeneralconicequationis Ax2 +2Bxy + Cy2+2Dx +2Ey +F=0 Thisequationmaybewrittenformallyas (x y 1) \A B D] x B C E y =xMxT =0 D E F 1

Puttingthegeneralconicequationintooneofthestandardformsisacommonana lyticgeometryexercise.Thesymmetric3 x 3 matrixMmay bediagonalized, thus eliminating the coefficients B, D,and Efrom the equation and reducing it to be closetostandardform.Thediagonalizationamountstoarigidmotionthatputsthe conicinasymmetricpositionattheorigin.Thetransformation isinfact the 3 x 3 matrixEwhoserowsareeigenvectorsofM.RecallthatifvisaneigenvectorofM, vM =Xv
488 App. 7 Some Mathematical Tools

ThenifDisadiagonalmatrixofthethreeeigenvalues,A.,,A2^3> butthen andMhas been transformed byasimilaritytransformation intoadiagonal matrix suchthat xDxT= 0 Thisgeneralideaisofcourserelated totheprincipalaxiscalculationgiveninSec tion Al.9.2,and extends to threedimensional quadricsurfaces such astheellip soid,cone,hyperbolicparaboloid,andsoforth.Thegeneralresultgivenabovehas particularconsequencesillustratedbythefollowingfactsabouttheellipse.Givena generalconicequationrepresentinganellipse,itscenter Cx0,yc) isgivenby BE2CD xc B24AC 2EA BD yc = 2 B 4AC Theorientationis

Themajorandminoraxesare
= 2 G

(A + C ) where

[B2+(AC)2VA

G F {Ax2 +BXcyc+ Cy2)

A1.11 INTERPOLATION

Interpolationfitsdatabygivingvaluesbetweenknowndatapoints.Usually,thein terpolating function passes through each given data point. Many interpolation methodsareknown;oneofthesimplestisLagrangean interpolation.
A1.11.1 OneDimensional

Given n+ 1 points Gc,,yj)7xo < x\ < < x,theideaistoproducean th degreepolynomialinvolvingn+1socalledLagrangeancoefficients.Itis fix) = 2


Sec. A7.77 Interpolation

Lj(x)yj
489

(* 0 , Ki)

( x , , K i )

p/>

U ,K 0 ) 0
\ql<

l*i,K0)

>.

Fig. A1.12 Four point lagrangean interpolation on rectangular grid.

whereLjix) isthey'th coefficient; , v_ (xxp) (XX\) (xXji) (xXj4.\) (xx) (Xj XQ) (XJ Xj) (Xj Xji) (jCjXji\) ' (Xj Xn) Other interpolative schemes includedivided differences, Hermite interpola tionforusewhenfunction derivativesarealsoknown,andsplines.Theuseofapo lynomialinterpolationrulecanalwaysproducesurprisingresultsifthefunction be inginterpolateddoesnotbehavelocallylikeapolynomial.
A1.11.2 TwoDimensional

Thefourpoint Lagrangean methodisforthesituationshowninFig.A1.12. Let/ / ( * / , yj). Then f(x0 +qk,y0 +ph) = ( l / ? ) (l<?)/oo + qi\~p) fio +p(X~q) /01+ Plfu
A1.12 THEFAST FOURIER TRANSFORM

The following routine computes the discrete Fourier transform of a one dimensional complex array XIn of length N = 2|08N and produces the one dimensionalcomplexarrayXOut.ItusesanarrayWoftheNcomplexNthrootsof unity, computed as shown, and an array Bits containing a bitreversal table of length N. N,LogN,W,andBitsareallglobaltothesubroutineaswritten. Ifthe logicalvariableForwardisTRUE,theFFTisperformed; ifForwardisFALSE,the inverseFFTisperformed. SUBROUTINEFFT(XIn,KOut,Forward) GLOBALW,Bits,N,LogN LOGICALForward COMPLEXXIn,Xout,W,A,B INTEGERBits ARRAY(0:N) W,Bits,XIn,XOut
490
App. 7 Some Mathematical Tools

DO(I = 0,N 1)XOut(I) = XIn(Bits(D) JOff = N/2 JPnt = N/2 JBk= 2 IOFF= 1 DO (1= l,LogN) DO (IStart = 0,N 1,JBk) JWPnt = 0 DO (K= IStart,IStart+ IOffl) WHEN (Forward) A= XOut(K + IOff) *W(JWPnt) + XOut(K) B= XOut(K + IOff) *W(JWPnt + JOff) + XOut(K) FIN ELSE A= XOut(K + IOff) *CONJG(W(JWPnt)) + XOut(K) B= XOut(K + IOff) *CONJG(W(JWPnt + JOff)) + XOu FIN XOut(K) = A XOut(K + IOff) =B JWPnt = JWPnt + JPnt FIN ... FIN JPnt = JPnt/2 IOff = JBk JBk= JBk*2 ... FIN UNLESS (Forward) DO (I= 0,N 1)XOut(I) = XOut(I)/N ... FIN END TO INITW Pi= 3.14159265 DO (K= 0 , N 1 ) Theta = 2*Pi/N W(K) = CMPLX(COS(Theta*K),SIN(Theta*K)) ... FIN ... FIN TO BITREV Bits(0) = 0 M= 1 DO (I = 0 , L o g N l ) DO (J = 0 , M 1 ) Bits(J) = Bits(J)*2
Sec.A1.12 The FastFourier Transform

491

BitsCJ+M)= Bits(J) +1 ... FIN M=M*2 FIN FIN

A1.13 THEICOSAHEDRON

Geodesicdomeconstructions provideauseful waytopartition thesphere (hence the threedimensional directions) into relatively uniform patches.Theresulting polyhedralooklikethoseofFig. A1.13. The icosahedron has12 vertices,20faces,and30edges.Letitscenter beat theoriginofCartesiancoordinatesandleteachvertexbeaunitdistancefromthe center. Define t,thegolden ratio=

a= b=

V7
1

Wt 5*)
b

c=a42b=4

d= a+b=rr 5I/J A =anglesubtended byedgeatorigin = arccos(^)

Fig. A1.13 Multifaceted polyhedra from theicosahedron. 492 App. 7 Some Mathematical Tools

Then anglebetweenradiusandanedge= b :arccos (b) edgelength = lb distancefrom origintocenterofedge a ta distancefrom origintocenterofface = V3 The12verticesmaybeplacedat

( 0, a, b)

{b, 0, a) (a, b, 0)
Thenmidpointsofthe20facesaregivenby /s(d, d, d ) M 0, a, c ) / 3 (c, 0, a ) /a(a, c, 0) Tosubdivideicosahedralfacesfurther, severalmethodssuggestthemselves, the simplest beingtodivideeachedgeinto n equallengthsand thenconstruct n2 congruent equilateraltrianglesoneachface,pushingthemouttotheradiusofthe spherefortheirfinalposition.(Therearebettermethodsthanthisifmore uniform facesizesaredesired.)

A1.14 ROOT FINDING

Sincepolynomialsoffifthandhigherdegreearenotsolubleinclosedform, numer ical(approximate) solutionsareusefulforthemaswellasfornonpolynomial func tions.TheNewtonRaphson methodproducessuccessiveapproximationstoareal rootofadifferentiable function ofonevariable.

Herex' isthe /thapproximation totheroot,andf(x') andf'(x') arethe function and itsderivative evaluated atx1. Thenewapproximation totherootisx' +1 . The successivegeneration ofapproximations canstopwhen they converge toasingle value.Theconvergencetoarootisgovernedbythechoiceofinitialapproximation tothe root and bythe behavior ofthefunction inthe vicinity oftheroot.For in stance,severalrootsclosetogethercancauseproblems. Theonedimensionalformofthismethodextendsinanaturalwaytosolving systems of simultaneous nonlinear equations. Given n functions Fh each of n parameters, the problem istofindthe set ofparameters that drives allthe func tionstozero.Writetheparametervectorx.
Sec.47.74 Root Finding

493

X , * 2 X =

Form thefunction column vector Fsuch that


/ i(x)

>2(x)

F(x)=
F(x)
TheJacobean matrix7isdefined as dFi dx\ J= dFn dx\ BFn Bxn BFi 0x2 BF] dx

Then theextension oftheNewtonRaphson formula is

x' 41 = xl

rl(x')F(x')

whichrequiresone matrix inversion per iteration.

EXERCISES Al.l xandyaretwotwodimensionalvectorsplacedtailtotail. Provethattheareaofthe triangletheydefineis|xxy|/2. Al.2 Showthatpointsqinaplanedefined bythethreepointsx,y,andzaregivenby q [ ( y x ) x (zx)J = x (y x z ) Al.3 Verify thatthevectortripleproductmaybewrittenasclaimedinitsdefinition. Al.4 Givenanarctangentroutine,writeanarcsineroutine. Al.5 Showthattheclosedformfortheinverseofa2x2Amatrixis an a2\ 1 det A an a\\ Al.6 Provebytrigonometrythatthematrixtransformations forrotationarecorrect.
494
App. 7 Some Mathematical Tools

A1.7 What geometric transformation isaccomplished when a^ ofageometric transfor mationmatrixAvariesfrom unity? Al.8 Establishconversionsbetweenthegivenlinerepresentations. Al.9 Writeageometrictransform tomirrorpointsaboutagivenplane. A1.10 What isthe lineequation representation ofaline LI through apoint xand per pendiculartoalineL2 (similarlyrepresented)?ParalleltoL2? Al.ll DerivetheellipseresultsgiveninSectionA1.10. A1.12 ExplicitlyderivethevaluesofA E,and^minimizingtheerrorterm

inthegeneralequationforacircle x2 +y2 +2Dx +lEy +F 0 A1.13 Show that if points and lines are transformed as shown in Section Al.7.6, the transformed pointsindeedlieonthetransformed lines. A1.14 Explicitly derive theleastsquarederror solution for linesrepresented asax +by + 1 = 0 . Al.15 Ifthreeplanesintersectinapoint,istheinverseof
pl p2 p3

ol
0 0 1

guaranteedtoexist? A1.16 Whatistheanglebetweentwothreespacelines? A1.17 Intwodimensions,showthattwolinesuandvintersectatapointxgivenbyx = ux V. A1.18 Howcanyoutelliftwo linesegments (defined bytheirendpoints)intersect inthe plane? A1.19 Finda4x 4matrix thattransforms anarbitrarydirection (orpoint)tolieontheZ axis. A1.20 Derive a parametric representation for planes based on three points lying in the plane. Al.21 Deviseaschemeforinterpolationonatriangulargrid. A1.22 Whatdoesthehomogeneouspoint (x,y, z,0)represent? REFERENCESANDFURTHERREADING Computer Graphics
1. N E W M A N , W. M., and R. F. SPROULL. Principles of Interactive Computer Graphics, 2nd Ed.New York:

McGrawHill,1979.
2. C H A S E N , S. H. Geometric Principles and Proceduresfor Computer Graphic Applications. Englewood

Cliffs,NJ:PrenticeHall,1978.
References and Further Reading

495

3. FAUX, I.D.,andM.J.PRATT. Computational Geometry or DesignandManufacture, Chichester, UK: f EllisHorwood Ltd, 1979. 4. ROGERS, D. F., and J. A. ADAMS. Mathematical Elements for Computer Graphics. New York: McGrawHill1976. Computer Vision 5. HORN, B.K.P."VISMEM: Vision Flash 34."AILab,MIT,December1972. 6. SOBEL,I."Camera modelsandmachine perception." A1M21,Stanford AILab,May 1970. 7. DUDA, R.0.andP.E.HART. PatternClassificationandSceneAnalysis.NewYork:Wiley,1973. 8. ROSENFELD. A.,andA.C.KAK. DigitalPictureProcessing.NewYork:Academic Press,1976. 9. PAVLIDIS,T.StructuralPatternRecognition.NewYork:SpringerVerlag,1977. Geometry, Calculus,Numerical Analysis 10. WEXLER, C.AnalyticGeometry:A VectorApproach. Reading, MA:Addison Wesley,1962. 11. APOSTOL,T.M.Calculus., Vol.2,Waltham, MA:Blaisdell,1962. 12. CONTE, S.D.andC.DEBOOR. Elementary Numerical Analysis: AnAlgorithmicApproach.New York: McGrawHill,1972. 13. RALSTON, A.AFirstCourseinNumericalAnalysis. NewYork: McGrawHill,1965. 14. ABRAMOWITZ, M,andI.A.STEGUN. Handbook ofMathematical Functions.NewYork:Dover,1964. 15. HODGEMAN, C.D.(Ed.).CRCStandardMathematical Tables. West Palm Beach, FL:CRCPress.
Geodesic Tesselations

16. BROWN, C."Fast displayofwelltesselated surfaces." ComputersandGraphics4, 1979,7785. 17. CLINTON,J.D."Advanced structural geometry studies, part I:polyhedral subdivision conceptsfor structuralapplications,"NASA CR1734/35,September1971. Fast Transformations 18. 19. PRATT, W.K.DigitalImage Processing.NewYork:WileyInterscience,1978. ANDREWS,H.C , andJ.KANE. "Kronecker matrices,computer implementation, andgeneralized spectra."/.ACM, April1970.

496

App. 7 Some Mathematical Tools

Advanced Control Mechanisms

Appendix2
Thisappendix isconcerned withspecificcontrolmechanismsthatareprovidedby programming languagesorthat maybeimplemented ontopofexisting languages asaidstodoingcomputervision. Thetreatmenthereisbrief;ouraimistoexpose the reader to several ideas for control of computer programs that have been developed inthe artificial intelligence context, andto indicate how they relateto themaincomputationalgoalsofcomputervision.

A2.1 STANDARD CONTROL STRUCTURES

For completeness, we mention the control mechanisms that are provided as a matterofcoursebyconventionalresearchprogramminglanguages,suchasPascal, Algol, POP2,SAIL, and PL/1.The influential language LISP, which providesa baselanguageformanyofthemostadvancedcontrolmechanismsincomputervi sion, ironically isitself missing (in its pure form) asubstantial number of these morestandardconstructs.Anothercommonlanguagemissingsomestandardcon trol mechanisms is SNOBOL. These standard constructions are so basic to the current conception ofaserialvonNeumanncomputerthattheyareoften realized in the instruction set ofthemachine. In thissensewearealmost talking hereof computerhardware. Thestandardmechanismsarethefollowing: 1. Sequence.Advancetheprogramcountertothenextintruction. 2. Branchinstruction.Gotoaspecificaddress. 3. Conditionalbranch.Gotoaspecificaddressifaconditionistrue,otherwise,go tothenextinstruction. 4. Iteration.Repeatasequenceofinstructionsuntilaconditionismet.

497

5. Subroutines.Gotoacertainlocation;executeasetofinstructionsusingasetof supplied parameters; then return to the next instruction after the subroutine call. Allthestandard controlstructuresshouldbeinthetoolkitofaprogrammer. Theywillbeused,together withthedatastructuresanddatatypessupplied inthe workinglanguage,toimplementothercontrolmechanisms. Theremainderofthis appendix dealswith "nonstandard" control mechanisms; those not typicallypro videdincommercialprogramminglanguagesandwhichhavenoclosecorrelatesin primitivemachineinstructions.Nonstandard controlmechanisms,althoughnotat alldomainspecific, havedeveloped tomeet needs that arenot the "lowest com mon denominator" of computer programming. They impose their own view of problemdecompositionjustasdothestandardstructures. Less standard mechanisms are recursion and coroutining. Coroutining can bethoughtofasaformofrecursion. A2.1.1 Recursion Recursion obeysalltheconstraintsofsubroutining, exceptthataroutinemaycall upon "itself." The user sees no difference between recursive and nonrecursive subroutines,butinternallyrecursionrequiresslightlymorebookkeepingtobeper formed inthe language software, sincetypically the hardwareofacomputer does notextend tomanagingrecursion (althoughsomemachineshaveinstructionsthat arequiteusefulhere). Atypicaluseofarecursivecontrolparadigmincomputervisionmightbe: ToUnderstandScene (X); ( If ImmediatelyApparent(X) thenReportUnderstandingOf(X); else (SimplerParts*Decompose(X); ForEachPartinSimplerParts UnderstandScene(Part);
) [ )

Recursion is an elegant wayto specify many important algorithms (such as tree traversals),butinawayithasnoconceptualdifferences fromsubroutining.Arou tineisbrokenupintosubroutines (someofwhichmayinvolvesmallerversionsof theoriginaltask);theseareattackedsequentially,andtheymustfinishbeforethey returncontroltotheroutinethatinvokesthem. A2.1.2 CoRoutining Coroutinesaresimplyprogramsthatcancall(invoke)eachother.Mosthighlevel languages do not directly provide coroutines, and thus they are a nonstandard control structure. However, coroutining isafundamental concept [Knuth 1973]
498
App. 2 Advanced Control Mechanisms

and serves here asabridge between standard and nonstandard control mechan isms. Subroutines and their calling programs have a"slavemaster" aspect:con trol isalwaysreturned tothe master callingprogram after thesubroutine hascar ried out itsjob. This mechanism not only leads to efficiencies by reducing the amountofexecutablecode,butisconsidered tobesouseful thatitisbuiltintothe instruction set of most computers.The pervasiveness ofsubroutining has subtle effects ontheapproachtoproblemdecomposition,encouragingahierarchicalsub problem structure. The coroutine relationship is more egalitarian than the subroutine relationship. Ifcoroutine Aneeds theservices ofcoroutine B,itcan callB,and (hereisthedifference) conversely,BcancallAifBneeds A'sservices. Hereisasimple (sounding) problem [Floyd 1979]:"Readlinesoftext,until acompletely blank lineisfound. Eliminateredundant blanksbetween thewords. Printthetext,30characterstoaline,withoutbreakingwordsbetweenlines."This problem ishard toprogram elegantly inmostlanguagesbecausetheiterationsin volved donot nest well (tryit!).However, an elegant solution existsifthejobis decomposed into three coroutines, calling each other to perform input, format ting,andoutputofacharacterstream. Auseful paradigm for problem solving, besides the strictly hierarchical, is that ofa"heterarchical" community ofexperts,each performing ajobandwhen necessary calling on other experts. A heterarchy can be implemented by co routines.Manyofthenonstandard mechanismsdiscussedbelowareinthespiritof coroutines.

A2.2 INHERENTLY SEQUENTIAL MECHANISMS

A2.2.1 Automatic Backtracking

The PLANNER language [Hewitt 1972] implicitly implemented the feature of "automatic backtracking." The advisability of uniformly using this technique, which isequivalent todepthfirst search, wasquestioned bythosewhowished to givetheprogrammer greaterfreedom tochoosewhichtasktoactivatenext [Suss manandMcDermott1972]. Abasicbacktrackingdisciplinemaybeprovidedbyrecursivecalls,inwhicha return toahigher level isa"backtrack." Thefeatures ofautomatic backtracking arepredicated onanabilitytosaveandreinstatethecomputationalstateofaproc essautomatically,withoutexplicitspecification bytheprogrammer. Automatic backtracking has itsproblems. One basicproblem occursinsys tems that perform inferences whilefollowing aparticular lineofreasoningwhich mayultimately beunsuccessful. Theproblemisthatalongtheway,perhapsmany perfectly valid anduseful computations wereperformed and manyfactswereadd ed to the internal model. Mixed in with these, ofcourse, are wrong deductions whichultimately causethelineofreasoningtofail.Theproblem:After havingre storedcontroltoahigherdecisionpointafter afailureisnoticed,howisthesystem
Sec.A2.2 Inherently Sequential Mechanisms

499

toknowwhichdeductionswerevalidandwhichinvalid? Oneexpensivewaysug gestedbyautomaticbacktrackingistokeeptrackofallhypothesesthatcontributed toderiving each fact. Then onecanremove allresultsoffailed deduction paths. This isgenerally the wrong thing to do; modern trends have abandoned the au tomaticbacktrackingideaandallowtheprogrammersomecontroloverwhatisre stored uponfailuredriven backtracking. Typically,acompromiseisimplemented in which the programmer may mark certain hypotheses for deletion upon back tracking.
A2.2.2 Context Switching

Contextswitchingisageneraltermthatisusedtomeanswitchingofgeneralproc essstate (acontrolprimitive) orswitchingadatabasecontext (adataaccessprimi tive).Thetwoideasarenotindependent,becauseitcouldbeconfusing foraproc esstoputitselftosleepandbereawakenedinatotallydifferent datacontext. Backtrackingisoneuseofgeneralcontrolcontextswitching. Themostgen eralcapabilityisa"generalGOTO."AregularGOTOallowsonetogoonlytoa particular location defined inastaticprogram.After theGOTO,allbindingsand returnpoints arestilldetermined bythecurrent stateofprocessing.Incontrast,a general GOTOallows atransfer not only across program "space," but through program "time" aswell.JustasaregularGOTOcangotoapredefined program label,ageneralGOTOcangotoa"tag"whichiscreatedtosavetheentirestateof aprocess.ToGOTOsuchatagistogobackintimeandrecreatethelocalbinding, access,control,andprocessstateoftheprocessthatmadethetag. Agoodexampleoftheuseofsuchpowerisgiveninaproblemsolvingpro gramthatconstructscomplexstructuresofblocks[Fahlman1974].

A2.3 SEQUENTIALOR PARALLELMECHANISMS

Somelanguageconstructsexplicitydesignateparallelcomputing.Theymayactual lyreflectaparallelcomputingenvironment,butmoreoften theycontrolasimulat ed version in which several control paths are maintained and multiprocessed under system control. Examples here are module and message primitives given belowand statements suchasthe COBEGIN, COEND pairswhich can bracket notionallyparallelblocksofcodeinsomeAlgollikelanguageextensions.
A2.3.1 ModulesandMessages

Modules and messagesform auseful, versatilecontrolparadigm thatisrelatively noncommittal.Thatis,itforcesnoparticularproblemdecompositionormethodo logicalstyleonitsuser,asdoesapuresubroutineparadigm,forexample.Message passing isageneral and elegant model ofcontrol which can beused to subsume others, such assubroutining, recursion, coroutining, and parallelism [Feldman 1979]. There are many antecedents to the mechanism ofmodules communicating bymessagesdescribedhere.Theyinclude [FeldmanandSproull 1971;Hewittand
500 App.2 AdvancedControlMechanisms

Smith 1975;Goldberg and Kay 1976;Birtwhistle et al. 1973].Inthe formulation presentedbyHewitt,themessagepassingparadigmcanbeextendeddownintothe lowest level of machine architecture. The construction outlined here [Feldman 1979]ismore moderate, sinceinitthe baseprogramming language may beused withitsfullpower,anditselfisnotmoduleandmessagebased. Aprogramismadeupofmodules.Amoduleisapieceofcodewithassociated localdata.Thecrucial point isthat theinternal stateofamodule (e.g.itsdata) is not accessible to other modules. Within a module, the base programming language,suchasAlgol,maybeusedtoitsfullpower (subroutinecalls,recursion, iteration,andsoforth areallowed).However,modulesmaynotinanysense"call upon" each other. Modules communicate only bymeans ofmessages. Amodule maysendamessagetoanothermodule;themessagemaybearequestforservice, an informational message, asignal,orwhatever. The module towhom the mes sage issent may, when it isready, receive the message and process it, and may thenitselfsendmessageseither totheoriginalmodule,orindeedtoanycombina tionofothermodules. The modulemessage paradigm hasseveral advantages oversubroutine (or coroutine)calls. 1. Ifsubroutinesareindifferent languages,thesubroutinecallmechanismsmust bemadecompatible. 2. Anysophisticatedlockoutmechanismforresourceaccessrequirestheinternal codingofqueuesequivalenttothatwhichamessageswitcherprovides. 3. Asubroutine that tries to execute alocked subroutine isunable to proceed withothercomputation. 4. Having a resource always allocated by a single controlling module greatly simplifiesallthecommonexclusionproblems. 5. For inherently distributed resources, message communication is natural. Modulevalued slots provide a very flexible but safe discipline for control transfers. Another viewofmessagesisasageneralization ofparameter listsinsubrou tineorcoroutinecalls.Theideaofexplicitlynamingparametersiscommon inas semblylanguages,wherethetotalnumber ofparameterstoaroutinemaybevery large.Moreimportant, themessagedisciplinepresentstoamoduleacollectionof suggested parametersratherthanautomaticallyfillinginthevaluesofparameters. Thisleadsnaturallytotheuseofsemanticchecksontheconsistencyofparameters andtotheuseofdefault valuesforunspecified ones,whichcanbeasubstantialim provement on typechecking.The useofreturn messagesallows multiplevalued functions; ananswermessagemayhaveseveralslots.Messagessolvethesocalled "uniform reference problem"one need not be concerned with whether an answer (sayanarrayelement) iscomputedbyaprocedureoratable. There isyet another useful viewofmessages.Onecan viewamessage asa partially specified relation (or pattern), with some slot valuesfilledinand some unbound. This is common in relational data bases [Astrahan et al. 1976] and artificial intelligence languages [Bobrowand Raphael 1974].Inthis view,ames
Sec.A2.3 Sequential or Parallel Mechanisms 5 0 1

sageisataskspecification withsomerecipientandsomecomplaintdepartmentsto talktoaboutit.Variousmodulescanattempt tosatisfy orcontractoutpartsofthe task offilling in the remaining slots.Amodule may handle messages containing slotsunknown toit.Thisallowsseveral modules toworktogether onataskwhile maintaininglocality.Forexample,anexecutivemodulecouldroutemessages (on thebasisofafewslotsthatitunderstands) tomodulesthatdealwithspecialaspects ofaproblemusingdifferent slotsinthemessage. Thereisnoapparent conflict between thesevarying viewsofmessages.Itis tooearly intheirdevelopment tobesure,but thecombined powerofthese para digmsseemstoprovideaqualitativeimprovement inourabilitytodevelopvision programs. A2.3.2 PriorityJobQueue In any system of independent processes on a serial computer, there must bea mechanism for scheduling activation. Onegeneral mechanism for accomplishing scheduling isthe priorityjobqueue.Priorityqueuesareawellknown abstraction [Aho et al. 1974]. Informally, a priority job queue is just an ordered list of processes to be activated. A monitor program is responsible for dequeueing processes and executing them; processes do not give control directly to other processes,butonlytothemonitor.Theonlywayforaprocesstoinitiateanotheris to enqueue it in thejob queue. It iseasiest to implement apriorityjob queueif processesaredefinable entitiesintheprogramming languagebeingused; inother words, programs should be manipulable datatypes. This ispossible in LISP and POP2,forexample. Ifaprocessneedsanotherjobperformed byanother process,itenqueues the subjobonthejobqueueand suspendsitself (itisdeactivated, orputtosleep).The subjob,whenitisdequeuedandexecutedbythemonitor,mustexplicitlyenqueue the "calling"processifasubroutining effect isdesired.Thusalongwithusualar guments tellingajobwhatdata toworkon,ajobqueuedisciplineimpliespassing ofcontrol information. Job queues are ageneral implementational technique useful for simulating othertypesofcontrolmechanisms,suchasactiveknowledge (Chapter 12).Also,a job queue can be used to switch betweenjobs which are notionally executing in parallel,asiscommon inmultiprocessingsystems.Inthiscasesufficient informa tionmustbemaintainedtostartthejobatarbitrarypointsinitsexecution. An example ofapriorityjob queue isaprogram [Ballard 1978] that locates ribsinchestradiographs.Theprogrammaintainsarelationalmodeloftheribcage including geometric and procedural knowledge. Uninstantiated model nodes corresponding toribsmight becalled hypotheses that thoseribsexist.Associated witheachhypothesisisasetofproceduresthat may,undervariousconditions,be used toverify it (i.e.,tofindarib).Procedurescarryinformation about precondi tionsthatmustbetrueinorderthattheymaybeexecuted,andabouthowtocom pute estimates oftheirutilityonceexecuted.Thesedescriptivecomponentsallow an executive program to rank the procedures byexpected usefulness at agiven time.
502 App.2 AdvancedControlMechanisms

(b)


.a a"

P.

"a

(i)
. .a . 'a . . . " ''

,<'' .

i
m

.''it *"*t

a.

{:

'i

W*Cfa '"^SftiJ

ijjii

if;?

\rnzM.
(e)

BL1. ^Slii
503

,_^

^_.y^ y

i,;;>j

Fig. A2.1 Theribfinding process inaction (seetext). Sec. A2.3 Sequential orParallelMechanisms

Thereisaninitialactionthatislikelytosucceed (locatingaparticularribthat isusually obvious in the xray). In heterarchical fashion, further actions use the resultsofpreviousactions.Oncetheinitialribhasbeenfound, itsneighbors (both above and below and directly across the body midline) become eligible for con sideration. Eligible ribfinding procedures correspond to shortterm plans; they areall putonajobqueuetobeconsideredbyanexecutiveprogramthatmustcomputethe expected utility of expending computational energy on verifying one of the hy pothesesbyrunningoneofthejobs.Theexecutivecomputesapriorityonthejobs basedonhowlikelytheyaretosucceed,usingtheutilityfunctions andparameters associated withtheindividualnodesintheribmodel (the individual hypotheses) andthecurrentstateofknowledge.Theexecutivenotonlypicksahypothesisbut alsotheprocedurethatshouldbeabletoverifyitwithleast effort. The hypothesis iseither "verified," "notverified," or "some evidence is found." Verifying ahypothesisresultsinrelatedhypotheses (about theneighbor ing ribs) becoming eligible for consideration. The information found during the verification processisused inseveral waysthatcanaffect the utility ofotherpro cedures. Thepositionoftheribwithrespecttoinstantiatedneighborsisusedtoadjust horizontal and vertical scale factors governing the predicted size of the ribcage. The position ofthe ribaffects the predicted range oflocations for other unfound ribs.Theshapeoftheribalsoaffects thesearchregionforuninstantiatedribneigh bors. Ifsomeevidenceisfound fortherib,butnotenoughtowarrantaninstantia tion, theribhypothesis isleft ontheactivelistand theribmodel node isnot in stantiated.Ribhypothesesleft ontheactivelistwillbereconsidered bytheexecu tive,whichmaytrythemagainonthebasisofnewevidence. Thesequenceoffigures(Fig.A2.1,p.503)showsafewstepsinthefindingof ribsusing thisprogram.Figure A2.1a shows the inputdata. A2.1bshowsrectan glesenclosingthe lungfieldand the initialareatobesearchedforaparticularrib whichisusuallyfindable.Onlyoneribfindingprocedureisapplicableforribswith noneighborsfound, soitisinvokedandtheribshownbydarkboxesinFig.A2.1b isfound. Predicted locations for neighboring ribs are generated and are used in order bythe executive which invokes the ribfinding procedures in order ofex pectedutility (A2.1ce).Predictedlocationsareshownbydots,actuallocationsby crosses; in Fig. A2.1f, all modelled ribs are found. The type of procedure that foundtheribisdenotedbythesymbolusedtodrawintherib.FigureA2.Ifshows thefinalribborderssuperimposedonthedata.

A2.3.3 PatternDirected Invocation Considerableattentionhasbeenfocusedrecentlyonpatterndirectedsystems(see, e.g., [Waterman andHayesRoth 1978]). Another common exampleofapattern directed system is the production system, discussed in Section 12.3.The idea behind apattern directed systemisthataprocedurewillbeactivated notwhen its
504
App. 2 Advanced Control Mechanisms

nameisinvoked,butwhenakeysituationoccurs.Thesesystemshaveincommon thattheiractivityisguidedbytheappearanceof"patterns"ofdataineither input or memory. Broadlyconstrued, alldataforms patterns, and hencepatternsguide anycomputation. Thissection isconcerned withadefinition ofpatternsassome thingverymuchsmallerthantheentiredataset,togetherwiththespecificationof controlmechanismsthatmakeuseofthem. Patterndirectedsystemshavethreecomponents. 1. Adatastructureordatabasecontainingmodifiable itemswhosestructuremay bedefinedintermsofpatterns 2. Patterndirectedmodulesthatmatchpatternsinthedatastructure 3. Acontrollingexecutivethatselectsmodulesthatmatchpatternsandactivates them Apopular name for apatterndirected procedure isademon. Demons were namedoriginallybySelfridge [Selfridge 1959].Theyareusedsuccessfully inmany AIprograms,notablyinanaturallanguageunderstandingsystem [Charniak1972]. Generally, ademon isaprogramwhich isassociated withapatternthat describes part of the knowledge base (usually the pattern isclosely related to the form of "items'''inadatabase).Whenapartoftheknowledgebasematchingthepatternis added,modified, ordeleted,thedemonruns"automatically."Itisasifthedemon wereconstantlywatchingthedatabasewaitingforinformation associatedwithcer tainpatternstochange.Ofcourse,inmostimplementationsonconventionalcom puters,demonsarenotalwaysactivelywatching.Equivalentbehaviorissimulated byhaving thedemonsregister theirinterestswiththesystemroutinesthataccess thedatabase.Thenuponaccess,thesystemcancheckfordemonactivationcondi tionsandarrangefortheinteresteddemonstoberunwhenthedatabasechanges. Advanced languages that support a sophisticated data base often provide demon facilities, which are variously known as ifadded and ifremoved pro cedures,antecedenttheorems,traps,ortriggers. A2.3.4 BlackboardSystems Inartificial intelligence literature, a"blackboard" isaspecial kind ofgloballyac cessibledata base.Thetermfirstbecame prominent inthecontext ofalargepat terndirectedsystemtounderstandhumanspeech [ErmanandLesser 1975;Erman etal.1980].Morerecently,blackboardshavebeenusedasavisioncontrolsystem [HansonandRiseman 1978].Blackboardsoften havemechanismsassociatedwith them for invoking demons and synchronizing their activities.Onecan appreciate that programming with demons can bedifficult. Sincegeneral patterns arebeing used, onecannever besureexactlywhenapattern directed procedurewillbeac tivated; often they can be activated in incorrect or bizarre sequences not antici pated bytheir designer. Blackboards attempt toalleviate thisuncertainly bycon trollingthematchingprocessintwoways: 1. Blackboards represent the current part of the model that is being associated withimagedata;
A2.3 Sequential or Parallel Mechanisms 5 0 5

2. Blackboardsincorporaterulesthatdeterminewhichspecializedsubsystemsof demonsarelikelytobeneededforthecurrentjob.Thisstructuringofthedata baseofprocedures increases efficiency and loosely corresponds toa "mental set." These two ideas are illustrated by Figs. A2.2 and A2.3 [Hanson and Riseman 1978]. Figure A2.2 shows the concept of a blackboard as a repository for only modelimagebindings.FigureA2.3showstransformations betweenmodelentities thatareusedtoselectappropriategroupsofdemons.

ShortTerm Memory imagespecificmodel

LongTerm Memory apriori general knowledge

Fig. A2.2 An implementation of the blackboard concept. Here the blackboard iscalledShortTerm Memory;itholdsapartialinlerpretation ofaspecific image. 506
App. 2 Advanced Control Mechanisms

CONSISTENCY TOPDOWN SCHEMAS

OBJECTS

VOLUMES

SURFACES O S

REGIONS

SEGMENTS

VERTICES

Fig. A2.3 Paths forhypothesis flow, showing transformations between model entities and the sorts ofknowledge needed for the transformations.

R E F E R E N C E S AHO, A. V., J. E. HOPCROFTand J. D. ULLMAN. TheDesignand Analysis of ComputerAlgorithms. Read ing, MA:AddisonWesley, 1974. ASTRAHAN, M. M.etal. "System R: A relational approach to data base management." IBM Research Lab, February 1976. BALLARD, D. H. "Modeldirected detection ofribs inchest radiographs." Proceedings, Fourth IJCPR, Kyoto,Japan, 1978. BIRTWHISTLE,G.etal.Simula Begin.Philadelphia: Auerbach, 1973. BOBROW, D. G. and B. RAPHAEL. "New programming languages for artificial intelligence." Computing Surveys6,3,September 1974, 155174. CHARNIAK, E."Towardsamodelofchildren'sstory comprehension." AITR266, AlLab,MIT, 1972. ERMAN, L. D. and V. R. LESSER." Amultilevel organization for problem solving using many diverse cooperatingsourcesofknowledge." Proc, 4th IJCAI, September 1975,483490.
ERMAN, L. D., F. HAYESROTH, V. R. LESSER, and D. R. REDDY. " T h e HEARSAYII speech

understanding system: Integrating knowledge to resolve uncertainty." ComputingSurveys 12,2, June 1980,213253. FAHLMAN, S.E." Aplanning system forrobot construction tasks." Artificial Intelligence5, 1,Spring 1974,149. FELDMAN, J. A. "Highlevel programming for distributed computing." Comm. ACM 22, 6,July 1979, 363368. FELDMAN, J.A. and R.F. SPROULL. "System support for the Stanford handeye system." Proc, 2nd IJCAI,September 1971,183189. FLOYD, R.W. "The paradigmsofprogramming." Comm. ACM 22, 8, August 1979,455460. References

507

GOLDBERG, A.and A. KAY (Eds)."SMALLTALK72 Instruction Manual."SSL766,Xerox PARC, PaloAlto,CA,1976. HANSON, A.R. and E.M. RISEMAN. "Visions:Acomputer system for interpreting scenes."In CVS, 1978. HEWITT, C. "Description and theoretical analysis (usingschemata) of PLANNER" (Ph.D.disserta tion).AITR258,AILab,MIT,1972. HEWITT,C.and B.SMITH."Towardsaprogrammingapprentice." IEEETrans. SoftwareEngineering. I, 1,March1975,2645. KNUTH,D.E. TheArt ofComputerProgramming,Vol.1. Reading,MA:AddisonWesley,1973. SELFRIDGE, 0. "Pandemonium, a paradigm for learning." In Proc, Symp.on the Mechanisation of ThoughtProcesses,NationalPhysicalLaboratory,Teddington, England,1959. SUSSMAN,G.J.and D. MCDERMOTT. "Why conniving isbetter than planning." AIMemo 255A,AI Lab, MIT,1972. WATERMAN,D.A.andF.HAYESROTH (Eds.).PatternDirectedInferenceSystems.NewYork:Academic Press,1978.

508

App.2 AdvancedControl Mechanisms

Author Index
Abdou, I. E.,76,77,84 Abelson, R. P., 322,334 Abramowitz, M., 496 Adams, J. A., 496 Aggarwal, J. K., 208, 216,220 Agin, G. J., 52,54,277,278 Aho, A. V.,359,502 Aikens, J. S., 407 Akatsuka, T., 79 Ambler, A. P.,237,346,359,366,370,375,439 Anderson, J. R., 320 Andrews, H. C , 40,65,496 Apostol, T. M., 496 Ashkar, G. P., 133, 137 Astrahan, M. M., 501 Attneave, F.,75 Badler, N. I.,216,217,219,220,280 Bajcsy, R., 184, 188,400,401,274,280 Ballard.D.H., 125,128,136,139,141,244,270,271,272, 273, 321,335,344,355,439,446, 502 Barnard, S.T., 208 Barnhill, R. E.,239,242,265 Barrow,H.G.,63,141,161,238,318,354,362,376,410, 413 Bartlett, F.C , 343 Baumgart, B.G., 266 Bellman, R., 137 Bentley, J. L.,281 Berge, C , 357 Berztiss, A.T., 358 Binford, T. O., 89,274,335 Birney,34 Birtwhistle, G., 501 Bittner,J. R., 364 Blinn, J., 95 Blum, H.. 252 Bobrow, D. G., 334,501 Boggess, L., 320 Bolles, R., 121,319,343,446 Bower, G. H., 320 Boyse, J. W., 282,285,289 Brachman, R. J., 323,326,331,396 Braunstein, M. L., 206 Bribiesca, E., 259 Brice, C , 157, 158,236 Brodatz, P., 166, 186 Bron, C., 359 Brooks, R. A., 335 Brown, C , 255,271,285,430,496 Buchanan, B.G., 407 Buckhout, R., 343 Chakravarty, I., 301 Charniak, E., 505 Chasen, S. H., 495 Chen, C. C., 80 Chien, Y. P., 136, 142 Chow, C. K., 152 Clinton, J. D., 271,496 Clocksin, W. F., 196,202,206 Clowes, M. B.,296 Collins, A., 384 Connors, R., 170 Conte, S. D., 496 Coons, S. A., 269

Corneil, D. G., 358,359 Cover, T. M., 182 Crowther, R. A., 59 Davis, L. S., 378,408 Davis, R., 398 deBoor, C , 239,242,496 DeGroot, M. H., 445 DeRosier, D. J., 57,59 Deliyanni, A., 390 Doyle, J., 347 Dreyfus, S., 137 Duda, R. O., 30, 123, 144,208,220,234,496 Eisenbeis, S. A., 155 Ejiri, J., 292 Elliott, G. L., 365 Elschlager, R. A., 142,360 Erman, L. D.,400, 505 Fahlman,S. E.,322,439,500 Falk, G., 292 Faux, I. D.,496 Feigenbaum, E.A., 407 FeIdman,J.A., 157,161,163,412,439,445,446,500,501 Fennema, C. L., 157, 158,236 Fikes, R. E., 395,396,397,439 Findler, N. V., 323 Fischler, M. A., 141,360 Floyd, R. W., 499 Fodor, J. D., 320 For.est, A. R., 265 Freeman, H., 211,235,236 Frei, W., 80 Freuder, E. C , 322,413,430 Fu, K. S 136, 142, 172, 173, 176, 181,238 Fukunaga, K.., 181 Funt, B.V., 322 Gallus,C , 236 Garfinkel, R. S., 429 Garvey,T. D., 54,319,439,446,449,452,453 Gelernter, H., 322 Geschwind, N., 342 Gibson, J. J., 168, 189, 196 Gips, J., 173 Goldberg, A., 501 Goldstein, I. P., 334 Gombrich, E. H., 343 Gomory, R. E.,429 Gonzalez, R. C , 25,65, 181 Gordon, R., 56 Gordon, W. J., 243 Gotlieb, C. C , 358,359

Graham, M., 206 Gramiak, R., 54 Granrath, D. J., 31 Greenberg, D., 33 Gregory, R. L., 343 Griffith, A. K...81 Guzman, A., 259,294

Hall, E. L., 186 Hannah, M. J., 68,89 Hanson, A. R., I l l , 150, 153, 161,505,506 Haralick, R. M., 184, 186,360,365,408,410 Harary, F., 357 Harlow, C. A., 155 Hart, P. E., 30, 123, 144,208,234,496 Hayes, P.J., 334,384,393,396,440 Hayes, PhilipJ., 331,334 HayesRoth, F.,397,399,504 Helmholtz, H. von, 196,319,348 Hendrix, G. G., 323,331,334,396 Henschen, L.,434 Herbrand, J., 389 Herman, G. T., 56, 146 Hewitt, C , 322,396,397,499, 500 Hinton, G. E.,335,420,425 Hodgeman, C. D.,496 Hopcroft, J. E , 172,359 Horn, B. K. P.,22,23,74,93,95, 104, 196,496 Horowitz, S. L., 157,233 Hough, P. V.C , 124 Hubel, D. H., 80 Hueckel, M. H., 76 Huffman, D. A., 296 Hummel, R. A., 82,420,430 Hunt, B.R., 65,40 Hurvich, L. M., 34 Ikeuchi, K., 93,99, 100 Jackins, C. L.,281 Jain,A. K., 18 Jain, R., 222 Jameson, D.,34 Jayaramamurthy, S.N., 179 Joblove, G. H., 33 Johansson, G., 210,215 JohnsonLaird, P.N., 319,320,321 Joshi, A. K.,400 Julesz, B., 169

Kak, A. C , 17,25,39,76, 144, 153,252,496 Kanade, T., 300, 306 Kane, J., 496 Kaneko, T., 152 Author Index

510

Kartus, J. S.,410 Kay, A.,501 Kelly, M. D., 121 Kender, J. R., 33, 169, 191 Kerbosch, J., 359 Kibler, D. P.,233 Kimme, C , 124,125 King,J., 398 Kirsch, R. A.,79 Klug, A., 59 Knodel, W., 359 Knuth, D. E., 108,498 Kosslyn, S. M., 320 Kowalski, R. A., 388,390,394,396 Kruger, R. P., 186,258 Lakatos, I.,267 Land, E. H., 34 Lantz, K. A., 125 Laws, K. I., 166, 185,186 Lawton, D.T., 196, 199,213,214 Lee, D. N., 196,206 Lee, Y.T., 287 Lehnert, W., 334 Lesser, V. R 400,505 Lester, J. M., 133,136 Levine, M. D., Ill Lieberman, L., 184, 188 Lindsay, R. K., 407 Lishman,J. R., 196,206 Liu, H. K., 82, 146 Loomis, J. M., 196 Loveland, D., 388,390 LozanoPerez, T., 319 Lu, S. Y., 173,176

Moravec, H. P.,69,89, 107,208 Munsell, A. H., 33 Nagel, H.H., 222 Nahin, P.J., 256 Nakayama, K., 196 Neisser, U., 343 Nemhauser, G. L.,429 Neurath, P. W., 236 Nevatia, R., 76,335,370,372 Newell,A., 334, 390 Newman, W. M.,495 Nicodemus, F. E., 23 Nilsson,N.J., 132,157,320,323,331,365,387,388,390, 396, 397,440,446 Nishihara, H. K., 264,282 Nitzan, D.,54 O'Connell, D. N.,210 Ohlander, R., 153 O'Neill, B.,276 O'Rourke, J., 216,217,280 Osteen, R. E., 359 Palmer, S. E.,320 Papert, S., 144 Paton, K. A., 239 Paviidis, T., 109, 157,233,253,254,496 Pennington, K. S., 52 Persoon, E., 136,238 Phong, B.T.,95 Pingle, K. K., 81 Poggio, T., 90, 91,93 Pomerantz,J. R., 320 Popplestone, R. J., 52,238,362,376 Potter, J. L., 220 Prager, J. M.,85, 196, 198,199 Pratt, M. J., 496 Pratt, W. K., 17,25,65,84, 181,496 Prazdny, K., 196,206 Prewitt, J. M. S., 77 Price, K. E., 221,335 Pylyshyn, Z. W., 320

Mackworth, A. K.,291,295,301,303,439 Maleson, J. T., 188 Markovsky, G., 211,289, 292 Marr, D.,90,91,93, 119,252,282 Martelli, A., 132, 143,145 Martin, W. N., 220 McCarthy, J., 384,395,396,440 McCulloch, W. S.,344 McDermott, D.,322,325,394,396,499 Mendelson, E.,383 Mero, L.,76 Merrill, R. E.,248 Milgram, D. L., 178 Minsky, M. L., 334,335,400 Mitchell,T. M.,407 Modestino, J. W., 133,137 Montanari, U., 139 Moore, J., 334 Moore, R. C , 330 Author Index

Quam,L.,68,89 Quillian, M. R., 323

Raiffa, H., 445,446 Ralston, A., 496 Ramer, U., 133 Raphael, B.,501 Rashid, R. F., 216,217 Reddy, R., 221

511

Reingold, E. M., 359,364 Reiter, R., 395,396 Requicha, A. A. G., 231,254,265,282,287,289, Riesenfeld, R. F., 239,242,265 Riseman, E. ML, 111,150, 153, 161,505,506 Roache, J. W., 216 Roberts, L.G., 76,77,234,235,292 Roberts, R. B.,334 Robertson, T. V., 153 Robinson, J. A., 383,389 Rogers, B.,206 Rogers, D. F.,496 Rosenfeld, A., 17,25,39,64,76, 144, 153,178,252, 415,416,496 Rumelhart, D. E., 334 Russell, D. L., 100 Russell, D. M., 335 Rychner, M., 400 Sacerdoti, E. D.,439 Samet, H., 249,251 Schank, R. C , 320,322,334 Schiff, W., 196 Schneier, M., 251 Schubert, L. K., 323 Schudy, R. B.,270,271,272,273,355 Schunck, B.G., 104, 196 Schwartz, S. P.,320 Selfridge, P.G., 122 Selfridge, O., 345,505 Shani, U., 274,277,279 Shapira, R., 89,211,292 Shapiro, L. G., 360,408 Shepard, R. N., 320 Shirai, Y., 81,233,292,346,439 Shortliffe, E. H., 407,453 Sjoberg, R. W.,22,23,94 Sklansky,J., 79, 125, 136,233,258 Sloan, K. R., 401 Sloman, A., 320 Smith, A. R., 33 Smith, B.,501 Smoliar, S. W 219 Snyder, W. E., 195 Sobel, I., 496 Soroka, B.I., 274 Sproull, R. F., 439,445,446,495,500 Stallman, R. M., 347 Stefik, M., 334 Stegun, I.A., 496 Steiglitz, K., 107 Stevens, K. A., 168,191 Stiny, G., 173 Stockham,T. J., Jr., 74

Sugihara, K., 52,301 Sussman, G. J., 322,347, 396,439,442,499 Tamura,H.,188 Tanimoto, S.L., 109,281 Tate, A., 439 Teevan, 34 Tenenbaum, J. M., 33,63,81,161,318,410,413 Thompson, W. B.,207,208 Tilove, R. B.,284 Tomek, I., 233 Tou, J. T., 181,359 Tretiak,O. J., 76 Tsotsos,J. K., 207 Turner, K. J., 89,234,239,292,299,301,433 Ullman,J. D., 172,359 Ullman,J. R., 358,359,360 Ullman, S.,210, 212

Vassy, Z., 76 Voelcker, H. B.,254,282,285,289,291

Waag, R. B.,54 Wallach, H., 210 Waltz, D. A., 299,320 Warnock, J. G., 287 Warren, D. H. D., 394,439 Waterman, D. A., 397,399,504 Wechsler, H., 79, 125,136 Wesley, M. A., 211,289,292,319 Weszka,J. S., 181 Wexler, C , 496 Weyl, S., 33 Whitted,T., 95 Wiesel,T. N., 80 Wilks, Y., 334 Will, P. M., 52 Winograd, T., 322,334, 384,395,396 Winston, P. H., 333,370 Wintz, P.,25,65 Woodham, R. J., 93,98 Woods, W. A., 323 Wu, S., 243

Yakimovsky,Y., 157, 161,163,412,446 Young, I. T., 256

Zadeh, L.,422,423 Zucker, S.W., 82,85, 150, 172,408,420,430

512

Author Index

Subject Index
AAlgorithm, 132 A priori probabilities in plan scoring,449 Abstraction in knowledge representation, 320,505 Acting and planning cycle, 315,347 Action: frame problem, 444 plans,441,446 Active: imaging, 14 knowledge, 384, 430434 Algorithm: A, 132 boundary evaluation for solids, 288291 correlation by binary search, 108 directedgraph isomorphism bybacktrack search, 364 discrete labeling, 410415 edge detection by dynamic programming, 140 edge detection, hierarchical, 109 edge relaxation, 85 ellipse detection with Hough algorithm, 127 fast Fourier transform, 490492 generalized Hough algorithm, 129 heuristic search, 132 line detection with Hough algorithm, 123 mass properties of solids, 285286 medial axistransformation, 252 multiframe optical flow calculation, 105 nondeterministic for graphs, 359 optical flow by relaxation, 104 piecewise linear curve segmentation, 234 quad treegeneration, 250 region boundary melting, 159 region growing, semantic, 162 region growing by blob coloring, 151 region merging with adjacency graph, 160 region splitting, recursive, 153 region splitting and merging, 157 set membership classification, 284 shape from shading, 100 shape number calculation, 260 solid to surface representation conversion, 290 stereopsis, 91 strip tree curveregion intersection, 247 strip treegeneration, 244 strip tree intersection, 245 tumor detection, 345 Aliasing,41 Ambiguity in grammars, 172 Analogdigital conversion, 50 Analogical models (See Knowledge representation, analogical and proportional) Andor trees as plans, 453459 Aperiodic correlation, 67 Applications of computer vision, 12 Arcs in semantic nets,324 Area: chain code, 235236 crosssection ofgeneralized cone, 278 in location networks, 336 polygon, 235 quad trees, 251 region, 254 Array grammar, 178181 Association graph, 358, 365369 Associative recall, 334 Asynchronous relaxation, 412

Atomic formula in logic, 384 Attention, control of, 340 Automated inference systems, 396

Bspline, 239243 Background subtraction: lowpass filtering, 72 spline surface, 72 Backtrack search, 363365, 372375 automatic, 499 variations and improvements, 364365 Backward chaining, 342,399 Bandlimited signal,41 Basis for color space, 3335 Bayes'rule,449(SeealsoDecision,theoryand planning) Bayesian decisions and region growing, 162 Belief maintenance, 319, 346 Bending energy of curve, 256 Binary search correlation, 108 Binary tree,244 Binocular imaging, 2022 (See alsoStereo vision) Binormal of space curve, 276 Blackboard, 505 Blob finding, 143146, 151 Block stacking, 322,438443 Blocks world: vision,291 structure matching, 370372 Bottomup (See Control; Inference) Boundary, 75,265 conditions for Bsplines, 241 detection, 119148 in binary images, 143 divide and conquer, 122 dynamic programming, 137143 Hough algorithm, 123131 evaluation, 288291 as graph,131 representations, 232247, 265274 Branch and bound search: backtracking improvement, 364 for boundaries, 136 Breakpoints in linear segmentation, 232

normalized, 236 Chamfer matching, 354 Charge transfer devices,49 Chessboard metric, 39 Chest radiograph understanding, 321,344346 Chromaticity diagram, 37 Chunks of knowledge, 334 Circular arcs, 237 City block metric, 38 Classification: in pattern recognition, 181184 set membership, 284 tree for regions, 163 Clause form of predicate logic,384 Clique, and usein matching, 358,366369, 375 Closed curves, 246 Closure operator for sets,282 Clustering: motion detection, 217 parametric and nonparametric for pattern recognition, 181183 Coroutining, 498 (See also Control) Coherence: of knowledge representation, 320 rule for linedrawing interpretation, 297 Collision detection with optic flow, 201 Color, 3135 bases, 3334 space histograms, 153155 Comb, dirac, 19,40 Combining operators for volumes, 282 Compactness of region, 256 Completeness of inference system, 389 Complexity of graph algorithms, 359 Component, rconnected, 369, 380 Computer as research tool,9 Concave line label, 296 Concavity tree of region, 258 Cone,generalized (See Generalized, cone) Confidence: planning,415(SeealsoSuppositionvalueinrelaxation) region growing, 164 Conic, 239,488489 Conjunctive normal form for logic,388 Connect line label, 303 Connected: component of graph, 369,380 region, 150,255 Connectives of logic, 385 Connectivity: difference, 375 image,36 matching, 372375 CONNIVER, 322 Consolidation 102,(See also Pyramid) Constraint (See also Relaxation) Subject Index

Calculus, predicate (See Predicate logic) Camera model and calibration, 481484 Cartesian coordinate system, 465 CAT imagery, 1,5659 Cell decomposition volume representation, 281 Centroid of volume, 285 Chain code, 235237, 256,258 area calculation, 236 derivative, 236 merging, 236

514

inconsistency, 427 as inequality in linear programming, 423 labeling, 408410 nary, 410 propagation, 299,413415 relaxation, 408430 satisfaction for belief maintenance, 347 semantic, on regiongrowing, 160164 Constructive solid geometry volume representation, 282 Context: data base, 440 switching, 500 Continuity of knowledge representation, 320 Contour: following, 143146 occluding, 101 Contrast enhancement, 71 Control, 315,340350, 497502 bottomup or datadriven, 341,344346 hierarchical and heterarchical, 341346 in knowledge representations, 317 message passing, 501 mixed topdown and bottomup, 344346 structures, standard and nonstandard, 497500 topdown, 342, 343346 Convergence of relaxation algorithms, 414,418 Conversions, logic to semantic nets, 332 Convex: decomposition of region, 253 line label, 296 region, 258 Convolution, 25,68 theorem, 30 Cooperative algorithms, 408430 Coordinate systems, definitions and conversions, 465468 Correctness of inference system, 389 Correlation, 25, 30, 6670 binary search, 108 coefficient, 419 metrics, 362 nonlinear for edge linking, 121 normalized, 6870 periodic and aperiodic, 67 texture, 187 Correspondence problem, 89 Cost: of planning, 452 in plans, 445459 Crack edges,78 Curvature: boundary, 256 in evaluation function, 133 space curve, 276 Curve231: Subject Index

detection. Hough algorithm, 126 fitting,487 intersection, 247 segmentation techniques, 233234 Cutting planes in linear programming, 428 Cylinder, generalized, 274280 Cylindrical coordinate system, 466

Data: base, 398, 431,440 driven control, 341346 fitting, 239, 484488 nodes in location networks, 336,338 structure for boundaries, 158 Decision: theory and planning, 446453 trees for matching, 370377 Decomposition: region, 253 solid, 287 Default values in knowledge representations, 330, 334335 Delete list, 440 Delta function, 1819,40 Demon, 412,429,505 DeMorgan's laws,387 Densitometer, 46 Density of image,44,74 Dependence, graylevel, 186188 Depth: first search and variations, 136,363365,372,412 from optic flow, 201 Determinant, 473 Difference measurement in motion, 221 Digital images, 3542 Digitizers, image,45 DiracComb, 19,40 Directionmagnitude sets, 270 Discrete: images, 3542 knowledge representation, 320 labeling algorithms, 410415 Disparity, 21,89,208 Dispersion of knowledge representation, 320 Distance: on discrete raster, 36 image (See Image, range) Distortion, perspective (See Projection, perspective) Divergence theorem for mass properties, 288 Divide and conquer: algorithms for CSG, 285 method for boundary detection, 122 Domaindependent and independent motion understanding, 196199,214219 Drum scanner, 46 515

Dual graph, 159 Dynamic programming and search, 137143 Early processing, 6365 Eccentricity of region, 255 Edge, 75 detection in binary images, 143146 from optic flow, 202206 in pyramids, 109 following, 131146 as blob finding, 143146 asdynamic programming, 137143 asgraph search, 131143 labels, 296297 linking, 119131 known approximate location, 121122 problems with, 119120 Hough algorithm, 123131 operator, 64, 7588 gradient, 7680 Kirsch, 79 Laplacian, 7679 performance, 77, 8384 relaxation, 8588 templates, 79 3D performance, 8183 profiles, 75 representation for surfaces, 266 strength in evaluation function, 133 thresholding, 80 Eigenvalues and eigenvectors, 473,486487 Element, texture, 166 Elongation of region, 255 Enclosing surface, 265 Energy, texture, 187 Engineering: drawings, 291 knowledge, 407 Entropy, texture, 187 ERTS imagery, 46 Euclidean metric, 38 Euler number of region, 255,266 Evaluation: function for heuristicsearch, 133 mechanism in semanlic networks, 337 Existential quantifiers, 385 Extended inference, 315,319,322,383,395396 Extensional concepts in knowledge representation, 328

texture, 184186 vectors and space, 181 Field, television, 46 Figureground distinction, 4 Filtering, 25, 6475 First order predicate logic (See Predicate logic) Fitting data (See Data, fitting) Flatbed scanner, 46 Flying spot scanner, 45 Focal length, 19,479 Focus of expansion in optical flow, 199 Formal inference system, 390 Forward chaining, 342, 399 Fourier descriptors, 238 filtering, 65 transform, 2430, 490492 Frame: problem, 395,444 system theory, 334335 Frenet frame and formulae, 276 Function: image, 1819 logic, 385 Skolem, 387 GHough algorithm, 128 Gamma, film, 45 Gaussian sphere, 101,270 Generalized: clipping, 284 cone, 274280 matching to data, 278, 372375 cylinder (See Generalized, cone) image, 6, 14,320 Geodesic tesselation, 271,493 Geometric: matching, 354 operations in location networks, 336 relations and propositions, 332 representations. 8, 227311 structures and matching, 354 transformations, 477481 Geometry theorem prover, 322 Gestalt psychology, 116 Goal achievement, 319,346347, 438439 Goodness of fit, 273 Gradient: edge operator, 7680 space, 95,301 techniques, 355 texture, 168, 189193 use in Hough algorithm, 124 Grammar: ambiguous, 172

Faces,271 for surface representation, 265,271 Feature classification and matching, 376378

516

Subject Index

array, 178181 on pyramid, 179 shape, 173174 stochastic, 172 texture, 172181 tree, 175178 Graph 131: adjacencyforregions,159 algorithms,complexity,359 association, 358,365369 dual,159 isomorphism,357359,364 matching,355 rconnectedcomponent, 369,380 Gray level, 18,23,35 dependencematricesfortexture, 186188 Grazing incidence, 111 H&D curve,44 Heart volume, 273 Heterarchical control, 341346,499 Heuristic search: boundaries, 131133 dynamicprogramming, 143 regiongrowing,157 Hierarchical (See also Pyramid) abstractions,505 control,341346 textures,170 Highlevel: models,317 motiondetection, 196199 vision,26 Hillclimbing and matching, 355 Histogram, 70 equalizationandtransformation, 70 splittingforthresholds,152 colorspace, 153155 Homogeneous: coordinatesystem,467 regions,150 texture,188 Hough algorithm, 123131 generalized, 128131 refinements,124 vanishingpoints,191 Human body for motion understanding, 214219 Hungry monkey planning problem, 445 Hypotheses, 343,384,422 activeknowledge,431 Hypothetical worlds,432,440 Iconicstructures and matching, 353 Icosahedron, 492 IHS color basis, 33 Image (See also Imaging) aerial, 1,335 CAT,1,5659 connectivity,36

digital,3542 digitizers,45 distanceonraster,36 edges,7588 ERTSorLANDSAT,46 formation, 17 function, 1819 generalized,6,14,320 histogram,70 intrinsic, 7, 14,63 irradiance,23,73 orthicon,47 plane,19 processing,2,17,25 range,5256,64,88 sampling,18,35 segmented,7 sequenceunderstanding,207222 ultrasound,54 variance,69 Imaging: active,14 devices,4259 geometry, 1921 lightstripe,22 model, 1742 monocularandbinocular, 1922 stereo,2022,5254,8893,98 Inconsistent labeling, 410 (See also Labeling) Indexing property of semantic nets,324 Inequalities in linear programming, 422,427 Inference, 314, 319321,383 bottomupandtopdown,392 extended,315,319,322,383,395 rules of, 388 insemanticnets,327 systems,formal andinformal,390 syllogistic,321 Infinity, point at, 20 Informal inference system, 390 Inheritance of properties in knowledge representation, 330, 335 Inhibitory local evidence in line drawings, 295 Intensional concepts in knowledge representation, 328 Interaction graph for dynamic programming, 143 Interest operator, 69,208 Interior operator, 282 Interpolation, 489490 Interpretation: matching,352 regiongrowing, 160164 Interpreter productionsystem,398399 semanticnet,326,339 Intersection of strip trees,244 Interval, sampling, 35 Intrinsic: image,7,14,63 parameters, 63

Subject Index

Inverse: perspective (See Projection) relations,331 Irradiance, image,23,73 Isomorphism, graph and subgraph, 357359, 364 Iterative region merging with semantics, 163 Job queue, 502504 Kdtree, 281,287 Knowledge: base,317,318323 chunking,334 engineering,407 Knowledge representation, analogical and prepositional, 9, 314,319322 (See also Active knowledge, Predicate logic, Procedural embedding, Production systems, Representation, Semantic nets) Labeling, 296301,408420 compatibilities, 415 consistent, inconsistent, optimal, 408,410 discrete, 410415 interpretation, 315,353,383,408 lines, 296301 scene, 408430 treesearch, 412 Lambertian surfaces, 94 LANDSAT imagery, 46 Laplacian operator, 7679 Laser rangefinders, 54(See also Image, range) Learning, 315 Leastsquared error fitting, 484488 Legendre polynomial, 272 Light: flux,22 stripe imaging, 22 structured, 5254 Line: detection, Hough algorithm, 123 drawing understanding, 265, 291307 drawings for motion understanding, 220222 equation, 475 fitting, 484487 labeling, 296301 labeling by relaxation, 416421, 428 representations, 474476 segment, 232 transformation, 480 Linear: structure matching, 378380 transformation, 473 Linear programming for relaxation, 420430 Linking edges (See Edge, linking)

Local evidence in line drawings,294 Location networks, 335340 Logarithmic filtering, 73 Logic (See Predicate logic) Longterm memory, 400 Lowlevel: motion detection, 196 vision,36 Manhattanmetric,36 Mass properties, 285 Matched filtering and Hough algorithm, 124 Matching, 315, 352,398 blocks world structures, 370372 cliquefinding, 366369, 375 complexity, 359 examples, 369380 expectationdriven, 353 generalized cylinders, 372375 geometric structures, 354 graphs, 355 iconicstructures, 353 knowledge representation, 319 line drawings to primitives, 293 linear structures, 378380 metrics, 360362,375,378 metrical in line drawing understanding, 294 nondeterministic algorithm, 359 optic flow, 208 optimization, 354 pattern (See Pattern, matching inproduction systems) relational structures, 353,355, 365372 rules in production system, 399400 templates and springs, 360362 topological, for line drawings, 293 Matrix algebra, 471474 Matte reflectivity, 23 Maximal clique, 366 Medial axis transform, 252253 Membership array for region, 248 Memory, longterm and shortterm, 400 Merging: branches for backtracking improvement, 364 curve segmentation, 233 regions, 155160 Messagepassing (See Modules and messages) Metric: distance on raster, 36 matching, 360362,375,378 Minimum: cost search for boundaries, 132 spanning tree for clustering, 217 Model: analogical and propositional, 319 driven control, 342

518

SubjectIndex

human body for motion, 217219 in knowledge representation, 9,317 Modules and messages, 500502 Modus Ponens and Modus Tollens, 388 Moment of inertia, 255,286,473,486 Monocular imaging (See Imaging) Motion, 195(See alsoOptic flow) adjacency and collision detection, 201 body model, 214219 common in sequence, 199,208 consistent match, 199 continuity, 197 depth, 201 human body, 214219 image sequences, 207222 maximum velocity, 198 moving light displays, 214217 observer, 206 rigid bodies, 197,210214 surface orientation and edge detection, 202 Multi dimensional histograms, 153155 modal sensor, 453459 resolution images, 100110 (See also Pyramid) Nearestneighbor clustering, 183 Network: interpreter, 326 representation, 391 (See alsoSemantic nets) NewtonRaphson, 493 Node types insemantic nets, 324329 Noise, 65 Nonclausal form, 385387 (See also Predicate logic) Nondeterministic algorithms, 359 Nonrigid: body motion understanding, 214217 solids,264 Nonstandard: control structures, 499507 (See also Control) inference (See Extended inference) Normalized correlation, 6870 NPcompleteness, 359 NTSC, 34

Optic flow, 65, 102105, 196, 199206 Optical system analysis, 23 Optimal labeling, 410 Optimization: linear programming, 424425 matching, 354 Orientation of surface (See Surface, orientation calculation) Origami world, 300 Orthicon, image,47 Orthographic projection (See Projection)

Objectidentification inlinedrawings,294 Occluding: contour, 101 line label, 296 Octtree, 281,287 Office scene understanding, 453459 Operator (See Edge,Interestoperator, Interior operator, Closure operator for sets, Planning Relaxation, etc.) Opponent processes, 33

PANDEMONIUM, 345 Parallel: computation, 64,341,360 iterative refinement in graph matching, 358,378 iterative relaxation, 64,412 Parameter: optimization as matching, 354 space, 123 Parametric: clustering, 183 edgemodels, 8081 linerepresentation, 476 Parseval's theorem, 256 Partial: knowledge in location networks, 339 matches, 360362 Partition: feature spacein pattern recognition, 181 Fourier space, 185 semantic nets, 331,391 space, 150 Pattern: directed invocation, 321,504 matching data base (See Data, base) matching in production systems, 399400 recognition, 2, 181184 texture, 166 Performance, edge operators, 77, 8384 Periodic: correlation, 67 function, 237 Perspective (See Projection) Photography, 4445 Photometric stereo vision, 98 Pictureelement (pixel),36 Piecewise polynomial, 240 Plane: curves and regions,231 cutting in linear programming, 428 representation, 476 transformation, 480 PLANNER, 322

Subject Index

Planning, 314319, 347,438445 cost of, 452 costs, 445459 edge linking, 121122 example, 453459 extended andor graphs,451 problem reduction, 394 symbolic, 439445 Point: at infinity, 20 membership, 246 projection (See Projection, perspective) spread function, 28 Polar and polar space coordinate systems,465 Polygon: area calculation, 235 images for motion, 220222 regions, 294 Polyhedra, 291 Polylines, 232235 Preand postconditions in plans,441 Predicate logic, 383395 clauses and semantic nets, 390392 decidability, 388 extensions [See Extended inference) inference, 315 knowledge representation, 392395 proof, 388 strengths and weaknesses, 393394 syntax and semantics, 384387 truth table, 386 Predicates in location networks, 336 Primitive: solids in volume representation, 280.282 volumes for line drawing understanding, 293 Principal axes: for fitting data,473,486 of inertia of region, 255 Procedural embedding of knowledge, 321,322,406, 430434 {SeealsoActive, knowledge) Processing, image, 17 Product of inertia of solid, 286 Production systems, 315,383,396408 example, 401406 rule matching, 399 strengths and weaknesses, 406408 Projection: inverse perspective, 481484 orthographic, 20,212 perspective, 1920,214,479 Proof of logic, 388390 Propagation of constraints (See Constraint) Property inheritance (See Inheritance ofproperties in knowledge representation) Propositional model (See Representation)

Prototype situations, 334 Pruning for backtracking improvement, 364 Pseudoinverse for fitting, 485,487 Psiscurve,237,238,256 Pyramid, 15(See alsoQuad tree; Strip trees) edge detection, 109 grammars, 179 multiresolution, 65, 106,249,281 thresholding, 155 Quad tree, 249252 area, 251 generation, 250 Quantifier, logical, 385 Radiance and gray levels,2223 Radiograph understanding, 321,344346, 502504 Range image(see Image) Ray casting, 280 Reciprocity failure offilm,44 Reconstruction, twodimensional image, 5659 Recursiveprocedure, 244,498 Reflectance, 9395 calculation, 7375 functions, common, 23,94 map, 9698 models, 2224 Region, 149150,231 data structures for, 158 finding, by thresholding, 152155 finding, local techniques, 151 growing and heuristic search, 157 growing and semantics, 160164 homogeneous, 150 partition, 150 properties, 254261, 376377 representation, 232254 by boundary, 232247 decompositions, 253 medial axis transform, 252 nonboundary, 247254 quad trees, 249 spatial occupancy array, 247 yaxis,248 splitting and merging, 155160 Regular: setsand set operators, 231,282283 tesselation, 170 Relational: functions and dynamic programming, 142 models,9, 314,317 semantic nets, 325,330,408 structures and matching, 353,355, 365372 Relaxation, 64(See also Constraint)

520

Subject Index

algorithms, convergence properties, 414 asynchronous, 412 convergence, 414 edgeoperators, 8588 for optic flow, 208 labelling, 408430 line labeling, 416421,428 linear and nonlinear operator example, 415420 aslinear programming, 420430 optical flow, 103 serial and parallel iterative, 412 shape from shading, 99102 stereo, 89 Representation: actions insymbolic planning,441 analogical and propositional, 9, 314, 319322 conversion, 289 knowledge, 317347 (See Knowledge) matching, 352355 predicate logic, 392395 range of, 69 solids, 264 world in symbolic planning, 439 Resolution gray levels,35 pyramids, 15, 106110 multiple in thresholding, 155 rule of inference, 389 spatial, 3637 texture, 169 theorem proving, 388390 Response, human spectral, 31 Rewritingrule(See Rule,Grammar, Production system) Ribfinding, 321,502504 Rigidity assumption in motion understanding, 210 Root finding, 493 Rotation rigid transformation, 477 Rotational sweep,274 Rule: based systems, 397 (See also Production systems) inference, 388 production system, 398 rewriting, 172,383,398 for texture generation, 172181

Sampling: image, 18,35 tesselation, 35 theorem, 39 Scaling matrix,478 Scanner, digitizing,45 Scene: irradiance, 73 labeling, 408430

Scopeof quantifier, 385 Scoring plans, 445459 Search: backtrack, 363365, 372375 depthfirst, 412 graph, and region growing, 157 heuristic and variations, 132136 tree, for labeling, 412 Segmentation, 7, 116(See also Edge; Line; Region) Semantic nets, 315317, 323340,390,396 arcs, 324 conversion with other representations, 332 examples, 334 indexing property, 324 inference, 327 nodes,324,328 partitions, 331 predicate logic, 390392 relations in, 325,330 semantics and partitions, 331 Semantics: of images and region growing, 160164 of logic,385 Semi decidability of logic, 388 regular tesselation, 171 Sentence of logic, 384 Serial: computation, 341 relaxation, 412 Set membership classification, 284 operations in location networks, 336 operations in three dimensions, 284 Shadow linelabel, 296 Shape,228 (See alsoThree dimensional) detection. Hough algorithm, 128131 grammar, 173174 properties of region, 254261, 376377 recognition, 228229 from shading, 65,93,99102 from texture, 189193 Shift theorem, 30 Shortterm memory, 400 Signature of region, 257 Silhouette detection, Hough algorithm, 128131 Similarity: analysis in motion, 221 tree for regions,261 Simplex algorithm, 423 Simulation with knowledge representation, 320 Skeleton (See Medial axis transform) Skew: symmetry, 306 transformation, 479

Subject Index

521

Skilled vision, 347 Skolem function, 387 Slope density function of boundary, 256 Slots in frames, 334 Smoothing image (See Consolidation) Solid (See Three dimensional) Spaces, color (See Color) Spatial: representations (See Three dimensional) resolution, 36,37 Spectral response, human eye,31 Specularreflectivity, 23(See alsoReflectance, functions, common) Spherical: coordinate system, 270,466 function, 270 harmonic surfaces, 271274, 355 trigonometry, 469 volume representation, 279 Splines (See Bspline) Splitting: curve segmentation, 233234 regions, 155160 Statistical: pattern recognition, 2, 181184 texture model, 168 Stereo vision, 2022,5254, 8893, 98 Stochastic grammars, 172(See also Grammar) Streaks and strokes, 134 Strip trees, 244247 STRIPS, 396 Structural: matching, 355,365372 models of texture, 170181 Structured light, 5254 Subgoals,343,438(Seealso Goalachievement;Planning) Subgraph isomorphism, 357359, 375(See also Graph) Supposition value in relaxation, 415,419 Surface: directionmagnitude set, 270 functions on Gaussian sphere, 270 geometry and linedrawings, 301307 orientation calculation, 64,93102, 189193,202206 patches, 269 representations, 265274 set of faces, 265 spherical harmonic, 271274 from volume calculation, 288291 Sweeprepresentations ofsolids,274280(SeealsoThree dimensional) Syllogistic inference, 321 Symbolic planning, (See Planning) Symmetry, skew, 306 Synchronization, 341

T vertices, 294 Tables, dynamic programming, 137139 Tangent to space curve, 276 Television cameras, 4652 Template: matching, 65 and Springs for matching, 360362 Term in logic,384 Tesselation: geodesic, 492 regular and semiregular for texture, 170172 sampling, 35 Texture, 166168,404 correlation, 187 element (texel), 166, 169, 188 element placement tesselations, 170172 energy, frequency and spatial domain, 184185 features, 187188 gradient and surface orientation, 168, 189193 grammars, 172181 homogeneity, 188 pattern recognition, 181184 resolution issues, 169 shape and vanishing point from, 189193 statistical and structural models, 168, 170181 Theorem: convolution, 25 divergence, 288 ParsevaPs,26,256 sampling, 39 shift, 30 Theorem proving by resolution, 388390 (See also Predicate logic) Thinning algorithms, 253 (See also Medial axis transform) Three dimensional: contour following in, 146 decomposition, 287 edge operators, 8183 image, 8893 model, 320 objects, several representation, 264, 274283 primitives for line drawing understanding, 293 shapes, 228 structure from image sequence, 210214 volume algorithms, 284 Threshold: determination from histogram, 152 multidimensional space, 153 multiple resolution, 155 for region finding, 152155 Tokentype distinction, 328 Topdown (See Control and Inference) Topological connectivity and matching, 293,372375

522

SubjectIndex

Torsion of space curve, 276 Transformation: geometric, 477481 lines and planes, 480 Translation rigid transformation, 479 Translational sweep, 274 Tree: grammars, 175178 quad, 249252 rearrangement for backtracking improvement, 364 search for labeling,412 strip, 244247 Triangulation (See Stereo vision) Trigonometry, plane and spherical, 468469 Truth table, 386 Tumor detection, 344346 Turtle algorithm for blob finding, 144 TV (See Television cameras) Twodimensional shape (See Shape) Typetoken distinction, 328

Vanishing pointfrom texture, 191

Variable nodes in semantic nets,329 Variance, image, 69,208 Vector algebra, 469471 Velocity (See Motion, Opticflow) Vergeance,21
Vertex:

catalogues, 298,300 types, 295 Vidicon, 48 Viewpoint, (see Projection) Virtual nodes in semantic nets,328 Vision: highand lowlevel, 26 as planning, 69 system organization, 352 Volume (See Three dimensional)

Ultrasound, 54,273

Unambiguous representation, 231 Undecidability of logic, 388 Units, meaningful, 116(See Segmentation) Universal quantifiers, 385 Unsatisfiability in logic, 388389 Utility theory and planning, 446,453

Waltzfiltering,299(See alsoConstraint; Labeling) Wellformed formulae of logic, 385 Winged edge for surface representation, 266267 Wire frame objects, 291 from projections, 211 World states in planning, 439

Yaxis region representation, 248 Y I Q color basis, 34

Subject

Index

523

FIG. 27a

FIG. 28a

ApaintingbyLouisCondax;courtesyofEastmanKodakCompanyandthe Optical Society of America.

Courtesy of D. Greenberg and G. Joblove, Cornell Program of Computer Graphics.

FIG. 2

Courtesy of Tom Check

FIG. 54a

Courtesy of Sam Kapilivsky. I

FIG. 54b

Courtesy of Sam Kapilivsky

FIG. 54c

m
*> <

4 f.v

*.

Bg

Courtesy of Robert Schudv

FIG. 113a

Courtesy of Robert Schudy.

OMPUTED

DANAH.BALLARDCHRISTOPHER M.BROWN
Whatinformation about scenes canbeextracted from an image using only basic assumptions about physics and optics? Howareimages segmented into meaningful parts? At what stage must domaindependent, prior knowledge about the world be incorporated into the understanding process? How are world models and conceptual knowledge represented and used? Theseand many other questions, inherent inthis relatively new and fast growing field,areexplored andanswered in by Theauthorsassemble crucial and material from many disciplines including artificial intelligence, psychology, computer graphics,and image processing to form apractical text and reference for anyone involved in building vision systems. Ballardand Brown write intheir preface," hasastrong artificial intelligence flavor,and wehope this will provokethought. The text shows how both intrinsic image information and internal models of the world are important in successful vision systems." Divided into four parts, levels of abstraction: offers descriptions of objects at four

Generalized imagesimages and imagelike entities; Segmented imagesimages organized into subimages that arelikely to correspond to "interesting objects"; Geometrical structuresquantitative modelsof image and world structures; Relational structurescomplex symbolic descriptions of imageand world structures.

PRENTICEHALL, INC.,Englewood Cliffs, N.J.07632

ISBN D13ltS31t4

You might also like