AI at Talenom: Partial Label Learning
In machine learning jargon, supervised learning means that we have a correct label for every training example in the training set. The opposite setting for supervised learning is unsupervised learning in which we have a set of inputs without the desired labels. Between these two extreme cases there is weak supervision in which we have some but not full knowledge of the labels. Semi supervised learning is a subclass of weak supervision in which we have labels for some of the examples while labels for the rest of the examples are missing. Besides of this well-known weakly supervised learning settings there are others as well. In this blog we handle a partial label learning, also known as superset learning. In partial label learning problem we have a set of candidate labels for single training example in which only one is correct. It is crucial to differentiate this setting from multi label classification in which one example may have multiple right labels. The differences between these learning schemes are illustrated in below figure that is copied from Cour et al [1].
Practical usage examples
Example 1 : Tagging faces from videos and photographs
The below usage example with the image is copied directly from Cour et al [1].
Example 2: Coarse labeling is cheap, accurate labeling is expensive
Imagine an application that identifies plants. It is relative cheap and fast to get coarse level training data since annotating, for example, a plant to mushroom does not require any expert level knowledge. However, annotating plants to more exact names such as bicolor bolete requires much more knowledge and time. Thus, it is much more expensive operation. Therefore, it would be great that we could combine a lot of coarse level labeled training data with a small amount of accurately labeled training data.
Similar case occurs in accounting field also. Coarse level posting is fast and cheap. However, some customers want to track their expenses with more details, and they are willing to pay for it. Therefore, we have a small amount of accurately labeled training data combined with a big amount of coarse level labeled training data.
Solution
The solutions for the partial label learning problem may be divided to three different approaches: identification based, average based and confidence based [2].
In identification based approaches the real ground truth label is considered as a latent variable that is tried to be identified and that single label is used as ground truth. Average based approaches treat all the candidate labels equally probable, and they output some sort of average result. Confidence based approaches are mixture of these where the confidence of the label is estimated instead of identifying one and only ground truth value.
We have got good results with a simple average based method that appears to be the same that is presented by Seo and Huh [3]. This approach requires only small fine tuning of standard categorical cross entropy loss function for neural networks. In this approach, probabilities of all the candidate labels are integrated after the final softmax layer.
Assuming that all the candidate labels are n-hot encoded in input, the below code provides TensorFlow implementation of the categorical cross entropy loss modified to partial label schema. It should be noted that when there is only one candidate label, the input is one hot encoded and the implementation matches with standard categorical cross entropy loss.
References
[1] Cour, T.., Sapp, B.., & Taskar, B.. (2011). Learning from Partial Labels. Journal of Machine Learning Research, 1225-1261.
[2] Xu, N., Qiao, C., Geng, X., & Zhang, M. (2021). Instance-Dependent Partial Label Learning. ArXiv, abs/2110.12911.