ACM MULTIMEDIA AT-ADD CHALLENGE 2026
The Grand Challenge on All-Type Audio Deepfake Detection
ACM MULTIMEDIA AT-ADD CHALLENGE 2026
The Grand Challenge on All-Type Audio Deepfake Detection
This section introduces the evaluation metrics and competition rules for both tracks. All participants are expected to follow these rules to ensure a fair and transparent evaluation process.
Evaluation Metrics
The official evaluation metric for this challenge is Macro-F1.
For Track 1, the task is binary classification (real vs. fake). The final score is computed as the average F1-score of the two classes, as shown below. This gives equal importance to both real and fake samples and makes the evaluation more robust under class imbalance.
For Track 2, the evaluation considers both class balance and audio-type balance. Specifically, for each audio type t, including speech, sound, singing, and music, we first compute the Macro-F1 score by averaging the F1-scores of the real and fake classes within that type, as shown below. We then average the resulting Macro-F1 scores across the four audio types to obtain the final Track 2 score. Therefore, the Track 2 metric ensures a two-level balance:
(1) equal weighting between real and fake samples within each audio type, and
(2) equal weighting across different audio types.
Competition Rules
Data Usage
Participants may use only the officially released training and development sets for model training, validation, model selection, and threshold determination. They are free to split the released data for internal training and validation, and may also merge the training and development sets for training. The progress and evaluation sets must not be used in any form for training, fine-tuning, pseudo-labeling, self-training, threshold tuning, or any other kind of model adaptation.
External Data
Except for the officially released data, the use of any external labeled or unlabeled audio data is strictly prohibited for training, fine-tuning, distillation, calibration, or pseudo-label construction, including self-generated synthetic data from external generative models or services. Participants must not introduce external datasets related to audio deepfake detection or other closely related authenticity-discrimination tasks, nor may they use models, checkpoints, or feature extractors that have been pre-trained, trained, or fine-tuned on such datasets.
Data Augmentation
Data augmentation is allowed only in the form of signal-level perturbation or transformation applied to the officially released data, rather than by introducing external audio data as additional training samples. Allowed augmentation strategies include, but are not limited to, additive noise, reverberation, compression, resampling, and signal-level augmentation methods such as RawBoost. Publicly available augmentation resources, such as MUSAN and RIR libraries, may be used only as augmentation sources and must not be treated as additional supervised training data.
Pretrained Models
Publicly available and traceable pretrained models are allowed, including self-supervised learning (SSL) models, audio large language models (ALLMs), multimodal large language models (MLLMs), and other general-purpose pretrained models, provided that their sources can be clearly specified in the final metadata. However, any external models, checkpoints, or feature extractors that have been supervisedly trained or fine-tuned outside AT-ADD for audio deepfake detection or other closely related authenticity classification tasks are strictly prohibited.
Fusion and Ensemble
Fusion and ensemble strategies are allowed, including feature-level fusion, score-level fusion, and decision-level fusion. The final submitted system may contain no more than 5 subsystems, and all components must comply with the same data usage rules.
Reproducibility
The system corresponding to the final submitted score must be fully automatic and reproducible. The use of opaque closed-source APIs or any other external services that cannot be independently reproduced by the organizers is prohibited. Manual intervention in test set prediction, listening-based correction, or manual annotation is not allowed.
Compliance Check
For top-ranked teams, the organizers reserve the right to request a method description, a resource declaration, a model list, and inference code. If any violation of the data usage rules is found, or if a system is determined to be non-reproducible or to involve test-set leakage, the organizers reserve the right to disqualify the submission.