Data distillation makes omni-supervised learning possible

WHAT THE RESEARCH IS:

An investigation of omni-supervised learning, a type of semi-supervised learning that uses a combination of data that’s been manually labeled for training purposes (supervised data) and unlabeled data (unsupervised data) and offers the potential to outperform state-of-the-art fully supervised methods. The proposed approach uses data distillation, a simple method of omni-supervision, to train an AI model.

HOW IT WORKS:

Data distillation is based on the classic idea of self-training, or making predictions on unlabeled data and using them to update the model. In a similar way, the model uses predictions to fill in the gaps in its own training data. The researchers developed a four-step sequence: training a model on a large amount of supervised data; applying the trained model to unsupervised data; generating labels for that unsupervised data; and, finally, incorporating both the supervised and (now automatically labeled) unsupervised data to go back and retrain the model. They found that models trained with data distillation performed better than those trained with supervised data alone.

WHY IT MATTERS:

These experiments prove that it is possible for omni-supervised learning to surpass the results found with large-scale supervised learning. Combining more traditional curated training data with a large amount of unlabeled data is a viable strategy and opens up the possibility of drawing information from real-world sources to speed the creation of AI systems.