Authors: Richard Hartley and Andrew Zisserman
Publisher: Cambridge University Press – 655 pages
Book Review by: Venkat Subramaniam
Computer vision has emerged in recent decades as a field whose purpose is to enable a computer to “see” akin to humans. This was a really tough task back in the 1960s. And four decades later by the end of the century “the task was still unresolved and formidable,” Olivier Fougeras writes in his Foreword to this unusual and refreshing book on this new and pioneering field in science
In the effort related to the development of artificial intelligence, enabling a computer to “see” has been one of the most important objectives. But computer vision as a child of artificial intelligence is probably still in its adolescence. Computer science and mathematics are other siblings if you will, of computer vision, and the neurosciences, physics, and the psychology of perception seem to be its cousins.
Why has it taken decades to enable computers to “see”? Fougeras writes that it is because researchers had overlooked the important fact that perception in general and visual perception in particular are far more complex in animals and in humans than what they had initially thought. New findings will hopefully empower the investigators to better understand biological perception.
Fougeras asserts: “There is of course no reason why we should pattern computer vision algorithms after biological ones,” and “the fact of the matter” is that:
- The way biological visions works is still largely unknown and therefore hard to emulate on computers, and
- Attempts to ignore biological vision and re-invent a sort of silicon-based vision have not been as successful as initially expected.
There have however been some practical as well as theoretical successes in understanding and implementing what has so far been learned about vision.
On the practical side it’s now possible for humans to guide vehicles – cars and trucks – on regular roads as well as on rough terrain using computer vision technology. Developing such capability in vehicles requires computers to perform real-time three-dimensional dynamic scene analysis. This is quite an elaborate and very detailed process, so it is quite impressive that humans have successfully “taught” computers to understand and put it into practical use!
On the theoretical side, the remarkable progress by computers to have “developed” vision includes not only being able to “see” other vehicles in motion on the same road and other roads, for example, but also other motionless objects of different shapes and dimensions in different locations. And all these are seen by the car’s computer from different angles and viewpoints.
The authors of this book – Richard Harley and Andrew Zisserman – are leading experts, and indeed true pioneers – in the field of geometric computer vision.
To give you an overview of the math and the physics required in understanding computer vision, we present below the titles of the 22 chapters that constitute this book.
- Introduction – A Tour of Multiple View Geometry
- Projective Geometry and the Transformations of 2D
- Projective Geometry and the Transformations of 3D
- Estimation – 2D Projective Transformations
- Algorithm Evaluation and Error Analysis
- Camera Models
- Computation of the Camera Matrix P
- More Single View Geometry
- Epipolar Geometry and the Fundamental Matrix
- 3D Reconstruction of Camera and Structure
- Computation of the Fundamental Matrix F
- Structure Computation
- Scene Planes and Homographies
- Affine Epipolar Geometry
- The Trifocal Tensor
- Computation of the Trifocal Tensor T
- N-Linearities and Multiple View Tensors
- N-View Computational Methods
- Auto-Calibration
- Duality
- Cheirality
- Degenerate Configurations
This is an excellent book on the subject of multiple view geometry. Its two expert authors show how computers are able to reconstruct a real world scene when “seeing” several images of that scene. Techniques used in projective geometry and photo-grammetry are explained and illustrated.
Hartley and Zisserman discuss geometric principles and their algebraic representations in terms of camera projection matrices, the fundamental matrix and the trifocal tensor. The theories and methods of computation of these entities are covered in this book with examples. This is an outstanding and pioneering work on the subject of computer vision
Authors:
Richard Hartley is affiliated with the Australian National University in Canberra, Australia
Andrew Zisserman is affiliated with the University of Oxford in Oxford in Oxford, the United Kingdom.