INDEX
Explanations
statements related to opinions or positions
taking a stance or position
New Auto-Interp
Negative Logits
sánchez
-0.49
otomatig
-0.49
fantasi
-0.47
IsMutable
-0.44
esternos
-0.43
fernández
-0.42
bonjour
-0.41
autoradio
-0.41
permu
-0.41
inspirations
-0.41
POSITIVE LOGITS
stance
0.98
position
0.86
Stellung
0.84
Position
0.84
position
0.81
POSITION
0.74
posición
0.73
posição
0.73
stances
0.72
Position
0.71
Activations Density 0.027%