search

actions - event

state: published

Robust Contrastive Vision-Language Test-Time Adaptation

title	Robust Contrastive Vision-Language Test-Time Adaptation
start_date	2025/04/01
schedule	14h-16h
online	no
location_info	salle Maryam Mirzakhani (bât. Borel)
summary	Test-Time Adaptation (TTA) involves updating the model on-the-fly to handle covariate shifts in the data. Common strategies restrict updates to batch normalization parameters. Most methods minimize entropy as an objective, promoting confident predictions and leveraging batch-level optimization to emulate the 'wisdom of the crowd.' However, entropy-based methods are suboptimal for vision-language models pre-trained with a contrastive loss. In this paper, we propose ClipTTA a novel test-time adaptation method specifically tailored for CLIP. ClipTTA employs a soft contrastive image-text adaptation loss that better aligns with CLIP?s pre-training objective. Gradient of the ClipTTA loss and its training dynamics shows its robustness to pseudo-labels drift and class collapse. This ClipTTA loss can be furthermore extended it with an Outlier Contrastive Exposure loss to effectively adapt the model to better detect out-of-distribution samples while adapting only on in-distribution samples.
responsibles	Leclaire

Workflow history

from state (1)	to state	comment	date
submitted	published		2025/03/28 14:40 UTC

hosted_by

Institut Henri Poincaré

speakers

event_of

Imaging in Paris (séminaire Parisien sur les Mathématiques de l’imagerie, Institut Henri Poincaré (IHP), UAR 839 Sorbonne Université / CNRS) (2024)

Event #3387186 - latest update on 2025/03/28, created on 2025/03/28