Author: Simon J.D. Prince, PhD
Publisher: Cambridge University Press – 580 pages
Book Review by: Sonu Chandiram
Computer vision is very much in prevalence and use in today’s world of face, finger, gesture, handwriting, image, and speech recognition. It is already growing in use with the advent of self-driving cars with which we hope accidents can be reduced significantly.
The number of practical uses of computer vision is growing dramatically, as are the capabilities of this growing technology. In the not-too-distant future, computer vision, combined with the features of artificial intelligence, may enable human beings to perform tasks we do not even envision today.
Andrew Fitzgibbon of Microsoft Research, author of the Foreword to this book writes: “I want you to read this book because it makes clear the most important distinction in computer vision research: the difference between model and algorithm.”
For the average reader who may not know this, there are already many books on computer vision, writes the author J.D. Prince, and he questions the reason as to why we need one more book on this subject.
He writes that most books on computer vision focus on the topics of object recognition and stereo vision, and asks if this is really the way we should organize our knowledge on computer vision. On the topic of object recognition, Dr. Prince writes that a variety of methods – e.g. bag of words models, boosting methods, constellation models, and subspace methods – have been applied to this task.
But these methods have little in common, he asserts. He writes: “Any attempt to describe the grand sweep of our knowledge devolves into an instructional list of techniques. How can we make sense of it all to a new student? I will argue for a different way to organize our knowledge.”
He then goes on to describe the various elements involved in object recognition and stereo vision, and the relationships among them, and proposes a new and moiré efficient way to approach problems involved in computer vision. The elements include: absence or presence of an objects, the goal or vision problem, the world state, and the measurements, depth, distance, RGB values, other aspects of their relationship among one another.
Dr. Prince writes: “We observe an image and from this we extract measurements. For example, we might the RGD values directly or we might filter the image or perform some more sophisticated processing. The vision problem or goal is to use the measurements to infer the world state. For example in stereo vision we try to infer the depth of the scene. In object detection, we attempt to infer the presence or absence of a particular class of object.”
Finally, he writes: “To accomplish the goal, we build a model. The model describes a family of statistical relationships between the measurements and the world state.”
To give you an overview of what you will find covered and discussed at different lengths in this book, we provide you a list of the titles of its 20 chapters:
- Introduction
- Probability
- Introduction to Probability
- Common probability distributions
- Fitting probability models
- The normal distribution
- Machine learning for machine vision
- Learning and inference in vision
- Modeling complex data entities
- Regression models
- Classification models
- Connecting local models
- Graphical models
- Models for chains and trees
- Models for grids
- Preprocessing
- Image preprocessing and feature extraction
- Models for geometry
- The pinhole camera
- Models for transformations
- Multiple cameras
- Models for vision
- Models for shape
- Models for style and identity
- Temporal models
- Models for visual words
As an ending to this book review, I will quote William T. Freeman of the Massachusetts Institute of Technology who writes on the back cover of this book:
“Computer vision and machine learning have gotten married and this book is their child. It gives you the machine learning fundamentals you need to participate in current computer vision research”
Author:
Simon J.D. Prince, PhD is a faculty member in the Department of Computer Science at University College London. He has taught courses on machine vision, image processing, and advanced mathematical methods. He has a diverse background in biological and computing sciences and has published papers across the fields of computer vision, biometrics, psychology, medical imaging, computer graphics, and human-computer interaction (HCI).