Self-supervised learning (SSL) is an unsupervised approach for representation learning without relying on human-provided labels. It creates auxiliary tasks on unlabeled input data and learns representations by solving these tasks. SSL has demonstrated great success on images (e.g., MoCo, PIRL, SimCLR, DINO, MAE), speech (e.g., CPC, HuBERT, wav2vec) and text (e.g., word2vec, BERT, RoBERTa, GPT, OPT) and has shown promising results in other data modalities, including graphs, time-series, audio, etc. On a wide variety of tasks, without using human-provided labels, SSL achieves performance that is close to fully supervised approaches. The existing SSL research mostly focuses on improving the empirical performance without a theoretical foundation. While the proposed SSL approaches are empirically effective on benchmarks, they aren’t well understood from a theoretical perspective or practical use-cases. For example, why do certain auxiliary tasks in SSL perform better than others? How many unlabeled data examples are needed by SSL to learn a good representation? How is the performance of SSL affected by neural architectures? And practically, where do self-supervised models shine compared to traditional supervised models? In the 4th iteration of this workshop, we continue to bridge this gap between theory and practice. We bring together SSL-interested researchers from various domains to discuss the theoretical foundations of empirically well-performing SSL approaches and how the theoretical insights can further improve SSL’s empirical performance.