INDEX
Explanations
mentions of car brands, particularly luxury brands like Mercedes and Adidas
New Auto-Interp
Negative Logits
resa
-0.16
_handling
-0.15
egal
-0.14
eb
-0.14
RESH
-0.14
owied
-0.14
robat
-0.13
оÑĩка
-0.13
aks
-0.13
eg
-0.13
POSITIVE LOGITS
-Benz
0.18
Bieber
0.16
inde
0.15
hausen
0.15
parer
0.15
γη
0.15
roller
0.14
warts
0.14
Ñģий
0.14
-dir
0.13
Activations Density 0.001%