What is Image Matting
Natural image matting is a fundamental computer vision task, which aims to predict an alpha matte of the foreground object from a given image. An image can be represented by a linear combination of foreground and background, with a blending of a parameter alpha. Mathematically, image matting is formulated as:
Ii = αiFi + (1 − αi)Bi ,
where Ii, αi, Fi and Bi are observed colour value, alpha value, foreground value and background value of pixel i, respectively.
The matting task estimates the value of alpha to extract a foreground effectively. This is an ill posed problem with only three known variables and seven unknowns. An additional challenge arises as there is no consistent definition of foreground in a image and it is a highly subjective factor, dependent on the scene.
While being similar to semantic segmentation, matting, generates a more natural and delicate foreground than semantic segmentation. It plays a central role in downstream tasks like image editing, advertising, background removal and background replacement. In video production, it is common to remove the background for visual effects and apply a new background in its place. Typically the alpha matte is computed using a blue (or green) background for easier segmentation.
Matting methods have evolved to handle such complex scenarios by providing auxiliary inputs like trimap, depth or scribbles to guide the matting process. More recently, some methods have focused on completely automatic methods to extract the relevant alpha matte.
Key Terms
Trimap - Natural matting algorithms often require a user generated segmentation to identify background, foreground, and unknown regions. This segmentation is called a trimap. In general, trimaps must be drawn by hand, either for each frame, or at keyframes. It guides the network to focus on refining only the unknown regions to generate a fine grained alpha matte.
Evaluation -
- SAD
- MSE
- GRAD
- CONN
Types of Image matting
Pre deep learning Conventional matting methods relied on assumptions such as local smoothness (color sampling) or structure affinity (affinity matrix). Due to the nature of such low-level color cues, it was difficult to handle the large amount of variation in complex images. Deep matting methods evolved to overcome this dilema.
DL based we will focus on this in this article
- Automatic
- Guided
Deep Architectures
Single Stage - DIM FBA Matting
Dual stage SHM
Multibranch architectures PPmatting DIS
Transformers!
- Mattformer First use of transformer for matting task
- VITmatte Introduced a lightweight decoder branch that is inspired from VIT detector. They reduce the computations from 18M to 2.5M in the decoder Also
Conclusion
- Methods with auxilliary guidance are still superior
- Transforms with attention mechanism are closing the gap and solving for global vs local consistency
References
- List of matting resources - awesome-image-matting