back
Back to Glossary

What is uncertainty sampling?

Uncertainty sampling is an active learning strategy that selects unlabeled items the model is least certain about and sends them to humans for labeling. The aim is to achieve better model accuracy with fewer annotations by spending effort where it moves the decision boundary the most. Synonyms include least-confidence sampling and margin sampling.

The approach has several classic variants:

  1. Least-confidence picks examples with the lowest predicted probability for the top class.
  2. Margin sampling chooses items where the gap between the two most likely classes is small.
  3. Entropy-based sampling targets items with the highest output entropy across classes.

All three focus on ambiguous cases that benefit most from expert input, which is why uncertainty strategies appear in most surveys of active learning methods. Batch-mode variants select groups of points per round, and stopping criteria monitor validation accuracy to decide when to switch from exploration to standard training.

In annotation operations, uncertainty sampling often rides atop model-assisted labeling. A pre-trained model proposes labels and confidence scores; a router sends low-confidence items to skilled reviewers, while high-confidence items auto-accept or receive lightweight checks. This triage reduces labeling volume and shortens cycle time without compromising training signal quality.

Example:

A document-classification system estimates class probabilities for 500,000 unlabeled emails. The team labels only the top 40,000 most uncertain messages by entropy, adds 10,000 randomly selected messages for coverage, and retrains. Accuracy improves compared to labeling 50,000 random emails, and the model converges faster.