Last modified: 2024-10-10 16:12
Background
Pseudo-labeling for deep SSL goes back at least to Lee et al.'s workshop paper in 2013. Here, we will consider the "Curriculum Pseudo-labeling" method proposed by Cascante-Bonilla et al. (AAAI 2021). You can find concise pseudocode for their original approach in Alg. 1 of that paper.
Here, we develop a simplified version of Curriculum Pseudo-labeling for our two half moons dataset in HW2.
This procedure is a very easy way to do semi-supervised learning without needing a dedicated SSL training procedure, we just reuse the existing supervised training procedure.
Pseudocode: Curriculum Pseudo-labeling for semi-supervised learning
Inputs:
N
: num labeled instances in dataset for trainingU
: num unlabeled instances in dataset for trainingF
: num feature dimensions in raw inputx_NF
(\(X^L\)) : feature-vectors for labeled set, as tensor of shape (N,F)y_N
(\(Y^L\)) : class labels for labeled set, as tensor of shape (N,)xunlab_UF
(\(X^U\)) : feature vectors for unlabeled set, as tensor of shape (U,F)
Return Values:
model
(\(\theta\)) : Parameters of neural net classifier
Procedure:
# Initialization: Fit model to labeled-set-only
\(\theta \gets \text{TrainSuper}(X^L, Y^L)\)
# Two-phase curriculum with pseudo-labels
for quartile-threshold \(\kappa\) in [0.5, 0.25]:
\(\hat{X}, \hat{Y} \gets \text{MakePseudoLabelForMostConfidentFraction}(X^U, \theta, \kappa)\)
\(\theta \gets \text{TrainSuper}(X^L \cup \hat{X}, Y^L \cup \hat{Y})\)
Here, \(\kappa\) is the desired quartile at which we want to threshold to decide which instances we keep and which we discard. Setting \(\kappa=0.5\) would retain the top 50% most confident predictions of each class. Setting \(\kappa=0.2\) would retain the top 80%.
Key differences from the original paper
- We are using just two phases, not five as they recommend. Mostly just to save training time.
- When building the pseudolabeled subset for a given \(\kappa\), we found that determining a separate probability threshold for each class was useful, especially if one class has much higher probabilities than another.