INDEX
Explanations
references to varying or distinct groups or elements
New Auto-Interp
Negative Logits
économies
-0.92
fumée
-0.80
doute
-0.71
nucléaire
-0.70
biens
-0.69
vectorielle
-0.69
contactez
-0.67
écou
-0.67
complètes
-0.66
żel
-0.66
POSITIVE LOGITS
different
1.89
Different
1.74
Different
1.69
different
1.67
DIFFERENT
1.38
diferentes
1.33
various
1.28
verschiedener
1.27
verschiedene
1.27
Various
1.25
Activations Density 0.121%