Pseudo-labeling SSL simplified for HW2


Last modified: 2024-10-10 16:12

Background

Pseudo-labeling for deep SSL goes back at least to Lee et al.'s workshop paper in 2013. Here, we will consider the "Curriculum Pseudo-labeling" method proposed by Cascante-Bonilla et al. (AAAI 2021). You can find concise pseudocode for their original approach in Alg. 1 of that paper.

Here, we develop a simplified version of Curriculum Pseudo-labeling for our two half moons dataset in HW2.

This procedure is a very easy way to do semi-supervised learning without needing a dedicated SSL training procedure, we just reuse the existing supervised training procedure.

Pseudocode: Curriculum Pseudo-labeling for semi-supervised learning

Inputs:

  • N : num labeled instances in dataset for training
  • U : num unlabeled instances in dataset for training
  • F : num feature dimensions in raw input
  • x_NF (\(X^L\)) : feature-vectors for labeled set, as tensor of shape (N,F)
  • y_N (\(Y^L\)) : class labels for labeled set, as tensor of shape (N,)
  • xunlab_UF (\(X^U\)) : feature vectors for unlabeled set, as tensor of shape (U,F)

Return Values:

  • model (\(\theta\)) : Parameters of neural net classifier

Procedure:

 
    # Initialization: Fit model to labeled-set-only

          \(\theta \gets \text{TrainSuper}(X^L, Y^L)\)

 
    # Two-phase curriculum with pseudo-labels 

         for quartile-threshold \(\kappa\) in [0.5, 0.25]:
                 \(\hat{X}, \hat{Y} \gets \text{MakePseudoLabelForMostConfidentFraction}(X^U, \theta, \kappa)\)
                 \(\theta \gets \text{TrainSuper}(X^L \cup \hat{X}, Y^L \cup \hat{Y})\)

Here, \(\kappa\) is the desired quartile at which we want to threshold to decide which instances we keep and which we discard. Setting \(\kappa=0.5\) would retain the top 50% most confident predictions of each class. Setting \(\kappa=0.2\) would retain the top 80%.

Key differences from the original paper

  • We are using just two phases, not five as they recommend. Mostly just to save training time.
  • When building the pseudolabeled subset for a given \(\kappa\), we found that determining a separate probability threshold for each class was useful, especially if one class has much higher probabilities than another.