INDEX
Explanations
affirmative statements or expressions of agreement
New Auto-Interp
Negative Logits
bitte
-0.52
ссмо
-0.47
ftc
-0.45
Skocz
-0.44
tvguidetime
-0.44
potete
-0.43
__(
-0.43
goku
-0.43
foro
-0.42
dernière
-0.41
POSITIVE LOGITS
Indeed
0.95
indeed
0.90
Indeed
0.89
的确
0.85
indeed
0.84
yes
0.83
Yes
0.83
inderdaad
0.81
agree
0.81
Agree
0.79
Activations Density 0.254%