Deformable
Contour Tracking using PF-MT (Particle Filter with Mode
Tracker)
(collaborators:
Anthony Yezzi, Yogesh Rathi and Allen Tannenbaum at Georgia Tech)
Important
Papers |
Software |
Abstract | Details |
Original
Videos and Results' Videos |
Talks |
Other papers |
Details:
We
consider
the problem of tracking the boundary contour of a moving and deforming
object
from a sequence of images. If the motion of the “object” or region of
interest
is constrained (e.g. rigid or approximately rigid), the contour motion
can be
efficiently represented by a small number of parameters, e.g. the
affine group.
But if the “object” is arbitrarily deforming, each contour point can
move
independently. Contour deformation then forms an infinite (in practice,
very
large), dimensional space. Direct application of particle filters for
large
dimensional problems is impractical, due to the reduction in effective
particle
size as dimension increases. But in most real problems, at any given
time,
“most of the contour deformation” occurs in a small number of
dimensions
(“effective basis”) while the residual deformation in the rest of the
state
space (“residual space”) is “small”. The effective basis may be fixed
or time
varying. Based on this assumption, we modify the particle filtering
method to
perform sequential importance sampling only on the effective basis
dimensions,
while replacing it with deterministic mode tracking in residual space
(PF-MT).
We develop the PF-MT idea for contour tracking.
Deforming
contours occur either due to changing region of partial occlusions or
when the
object of interest is actually deforming its shape over a time or space
sequence of images. Examples of the second kind are a beating heart,
moving
animals or humans, or the cross-sections of different parts of a 3D
object like
the brain, in consecutive MRI slices. Most biological images contain
deforming
objects/regions. Contour tracking has many applications in medical
image
analysis, e.g. sequential segmentation of volume images; tracking heart
regions
or image guided surgery. The observation likelihood is often multimodal
due to
background objects (clutter) which are partially occluded by the
“object of
interest” or due to an object which partially occludes the “object of
interest”
or due to low contrast imagery. Heavy tailed and often multimodal
observation
likelihoods occur when the observation noise has occasional outliers.
In
our
initial work (CVPR 2005, PAMI 2007), we treated the 6 dimensional space
of affine
deformations as the
"effective basis" while the space of non-affine deformation was the
residual space. The implicit assumption is that the posterior of
non-affine
deformation (conditioned on affine deformation and the current image)
is
unimodal. This is valid for many practical problems where the
non-affine
deformation per frame is small, e.g. a rigid object tracked by a
perspective
camera with frequent viewpoint changes, or approximately rigid objects,
e.g.
human body contour from a distance.
But
in other
situations, where local deformations are large, there may be more than
one non-affine
mode for the same affine deformation value and the same image, i.e.
posterior
of non-affine deformation may be multimodal. Example applications are a
rigid
object undergoing partial occlusions, e.g. a car going under a light
pole, or
tracking regions of interest in low contrast medical images (multiple
nearby
contour modes due to the low contrast). Such applications also require
importance sampling on the space of non-affine deformations. In recent
work (CDC 2006, Trans. IP, Accepted 2008), we use global
translations and deformation velocity at subsampled contour locations
interpolated using a B-spline basis as the effective basis. The
effective basis
dimension is allowed to change with time. We are able to get excellent
results
with as low as K=6 subsampled points which is much smaller than the
total
number of contour points, M=150-200. Or in other words, the deformation
“signal” is approximately bandlimited (spatially), with the approximate
cut-off
frequency being much smaller than the maximum measurable frequency,
0.5Hz
(cycles/pixel). We can increase K if the approximate cut-off frequency
increases.