The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
Arxiv
Tensorflow / Pytorch implementation
Demo
1. Abstract
1-1. Loss function for direct optimization w.r.t Jaccard index
1-2. Optimizing the Jaccard index in a continuous optimization framework
1-3. Lovasz hinge loss: binary image segmentation
Lovasz Softmax loss: multi class image segmentation
2. Optimization surrogates for submodular loss functions
2-1. Optimization of Jaccard loss (a problem to select a class for each pixel) is a discrete optimization problem and NP-hardness (2^p)
2-2. Jaccard set function (6) has been shown to be submodular ( Yu, 2015, The Lovász Hinge: A Novel Convex Surrogate for Submodular Losses ) and can be computed in polynomial time.
2-3. The Lovasz extension of a set function (=Jaccard loss) can be expressed as equation (8) and (9).
2-4. By submodularity, the lovasz extension of a set function is the tight convex closure, piecewise linear and interpolates the value of an original set function.
2-5. The vector g(m) of which the components are defined in equation (9) directly corresponds to the derivative of the Lovasz extension of a set function w.r.t m.
3. Lovasz hinge
3-1. Figure above shows the Lovasz hinge applied to the Jaccard loss.
3-2. Substituting m_i to equation (8), can derive Lovasz hinge applied to the Jaccard loss.
3-3. In the case of a single prediction (only one pixel), the Lovasz hinge reduces to the standard hinge loss.
3-4. In the figure above, red points indicate the values of Jaccard index. Dots are on the Lovasz hinge convex surface.
3-5. In the process of optimization, network is trained as a relative margin vector r goes to the lowest red point.
4. Lovasz Softmax
4-1. Figure above shows the Lovasz Softmax applied to the Jaccard loss.
4-2. Instead of using max-margin setting used in Lovasz hinge setting, Softmax probabilities are used to derive m_i(c) in equation (11).
4-3. When considering the class-averaged mIoU metric, average the class-specific loss surrogates in equation (12).
5. Optimization of intersection over union
5-1. The Lovasz extension (Equation (8)) can be achieved by sorting the elements of m in O(p log p) time and doing O(p) calls due to pixel size (p).
5-2. The algorithm below can derive gradient of the Jaccard loss extension ( =[g_1(m), g_2(m), ..., g_p(m)] ) in time O(p log p).
6. Reference
この記事が気に入ったらサポートをしてみませんか?