Our baseline method is derived from Noisy-Student learning method in semi-supervised image classification [1] and semi-supervised urban scene segmentation [2] tasks. We employ 3D nnU-Net [3] for both teacher and student models. It includes five main steps:

  • Step 1. Training a teacher model on the manually labelled data.
  • Step 2. Generating pseudo labels of the unlabelled data via the teacher model.
  • Step 3. Training a student model on both manually and pseudo-labelled data.
  • Step 4. Finetuning the student model in step 3 on the manually labelled data.
  • Step 5. Going back to step 2 and replacing the teacher model with the student model for a desired number of iterations.

Code and trained models are publicly available here.

The following tables present the results of the baseline model


[1] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, “Self-training with noisy student improves imagenet classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 687–10 698.

[2] L.-C. Chen, R. G. Lopes, B. Cheng, M. D. Collins, E. D. Cubuk, B. Zoph, H. Adam, and J. Shlens, “Semi-supervised learning in video sequences for urban scene segmentation,” European Conference on Computer Vision, 2020.

[3] F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature Methods, vol. 18, no. 2, pp. 203–211, 2021.