Robust Contrastive Vision-Language Test-Time Adaptation

titleRobust Contrastive Vision-Language Test-Time Adaptation
start_date2025/04/01
schedule14h-16h
onlineno
location_infosalle Maryam Mirzakhani (bât. Borel)
summaryTest-Time Adaptation (TTA) involves updating the model on-the-fly to handle covariate shifts in the data. Common strategies restrict updates to batch normalization parameters. Most methods minimize entropy as an objective, promoting confident predictions and leveraging batch-level optimization to emulate the 'wisdom of the crowd.' However, entropy-based methods are suboptimal for vision-language models pre-trained with a contrastive loss. In this paper, we propose ClipTTA a novel test-time adaptation method specifically tailored for CLIP. ClipTTA employs a soft contrastive image-text adaptation loss that better aligns with CLIP?s pre-training objective. Gradient of the ClipTTA loss and its training dynamics shows its robustness to pseudo-labels drift and class collapse. This ClipTTA loss can be furthermore extended it with an Outlier Contrastive Exposure loss to effectively adapt the model to better detect out-of-distribution samples while adapting only on in-distribution samples.
responsiblesLeclaire