Jacobus J Barnard
Publications
PMID: 16886866;Abstract:
This paper argues that tracking, object detection, and model building are all similar activities. We describe a fully automatic system that builds 2D articulated models known as pictorial structures from videos of animals. The learned model can be used to detect the animal in the original video - in this sense, the system can be viewed as a generalized tracker (one that is capable of modeling objects while tracking them). The learned model can be matched to a visual library; here, the system can be viewed as a video recognition algorithm. The learned model can also be used to detect the animal in novel images - in this case, the system can be seen as a method for learning models for object recognition. We find that we can significantly improve the pictorial structures by augmenting them with a discriminative texture model learned from a texture library. We develop a novel texture descriptor that outperforms the state-of-the-art for animal textures. We demonstrate the entire system on real video sequences of three different animals. We show that we can automatically track and identify the given animal. We use the learned models to recognize animals from two data sets; images taken by professional photographers from the Corel collection, and assorted images from the Web returned by Google. We demonstrate quite good performance on both data sets. Comparing our results with simple baselines, we show that, for the Google set, we can detect, localize, and recover part articulations from a collection demonstrably hard for object recognition. © 2006 IEEE.
Abstract:
We work with a model of object recognition where words must be placed on image regions. This approach means that large scale experiments are relatively easy, so we can evaluate the effects of various early and mid-level vision algorithms on recognition performance. We evaluate various image segmentation algorithms by determining word prediction accuracy for images segmented in various ways and represented by various features. We take the view that good segmentations respect object boundaries, and so word prediction should be better for a better segmentation. However, it is usually very difficult in practice to obtain segmentations that do not break up objects, so most practitioners attempt to merge segments to get better putative object representations. We demonstrate that our paradigm of word prediction easily allows us to predict potentially useful segment merges, even for segments that do not look similar (for example, merging the black and white halves of a penguin is not possible with feature-based segmentation; the main cue must be "familiar configuration"). These studies focus on unsupervised learning of recognition. However, we show that word prediction can be markedly improved by providing supervised information for a relatively small number of regions together with large quantities of unsupervised information. This supervisory information allows a better and more discriminative choice of features and breaks possible symmetries.
Abstract:
In this paper we present a comprehensive method for identifying probable shadow regions in an image. Doing so is relevant to computer vision, colour constancy, and image reproduction, specifically dynamic range compression. Our method begins with a segmentation of the image into regions of the same colour. Then the edges between the regions are analyzed with respect to the possibility that each is due to an illumination change as opposed to a material boundary. We then integrate the edge information to produce an estimate of the illumination field.
Abstract:
We develop an approach to reduce correspondence ambiguity in training data where data items are associated with sets of plausible labels. Our domain is images annotated with keywords where it is not known which part of the image a keyword refers to. In contrast to earlier approaches that build predictive models or classifiers despite the ambiguity, we argue that that it is better to first address the correspondence ambiguity, and then build more complex models from the improved training data. This addresses difficulties of fitting complex models in the face of ambiguity while exploiting all the constraints available from the training data. We contribute a simple and flexible formulation of the problem, and show results validated by a recently developed comprehensive evaluation data set and corresponding evaluation methodology. © 2007 IEEE.
Abstract:
Color is of interest to those working in computer vision largely because it is assumed to be helpful for recognition. This assumption has driven much work in color based image indexing, and computational color constancy. However, in many ways, indexing is a poor model for recognition. In this paper we use a recently developed statistical model of recognition which learns to link image region features with words, based on a large unstructured data set. The system is general in that it learns what is recognizable given the data. It also supports a principled testing paradigm which we exploit here to evaluate the use of color. In particular, we look at color space choice, degradation due to illumination change, and dealing with this degradation. We evaluate two general approaches to dealing with this color constancy problem. Specifically we address whether it is better to build color variation due to illumination into a recognition system, or, instead, apply color constancy preprocessing to images before they are processed by the recognition system.