|
Robust Contrastive Vision-Language Test-Time Adaptation| title | Robust Contrastive Vision-Language Test-Time Adaptation |
|---|
| start_date | 2025/04/01 |
|---|
| schedule | 14h-16h |
|---|
| online | no |
|---|
| location_info | salle Maryam Mirzakhani (bât. Borel) |
|---|
| summary | Test-Time Adaptation (TTA) involves updating the model on-the-fly to handle covariate shifts in the data. Common strategies restrict updates to batch normalization parameters. Most methods minimize entropy as an objective, promoting confident predictions and leveraging batch-level optimization to emulate the 'wisdom of the crowd.' However, entropy-based methods are suboptimal for vision-language models pre-trained with a contrastive loss. In this paper, we propose ClipTTA a novel test-time adaptation method specifically tailored for CLIP. ClipTTA employs a soft contrastive image-text adaptation loss that better aligns with CLIP?s pre-training objective. Gradient of the ClipTTA loss and its training dynamics shows its robustness to pseudo-labels drift and class collapse. This ClipTTA loss can be furthermore extended it with an Outlier Contrastive Exposure loss to effectively adapt the model to better detect out-of-distribution samples while adapting only on in-distribution samples. |
|---|
| responsibles | Leclaire |
|---|
Workflow history| from state (1) | to state | comment | date |
| submitted | published | | 2025/03/28 14:40 UTC |
| |
|