INDEX
Explanations
frequent expressions of skepticism or disagreement
New Auto-Interp
Negative Logits
isque
-0.17
chein
-0.16
astes
-0.15
iÄħ
-0.15
only
-0.14
umba
-0.14
imbus
-0.14
baugh
-0.14
ennen
-0.14
-0.13
POSITIVE LOGITS
anymore
0.19
apas
0.15
327
0.15
.twitch
0.15
quat
0.14
ettle
0.13
y
0.13
*sp
0.13
287
0.13
TEE
0.13
Activations Density 0.057%