INDEX
Explanations
affirmations and expressions of agreement
New Auto-Interp
Negative Logits
ect
-0.21
Doch
-0.15
eb
-0.15
ogne
-0.15
celik
-0.14
ão
-0.14
tn
-0.14
quand
-0.14
andon
-0.14
nt
-0.14
POSITIVE LOGITS
sure
0.20
yeah
0.20
sure
0.20
emek
0.17
redient
0.17
right
0.17
Yeah
0.17
tember
0.17
quake
0.16
ARB
0.16
Activations Density 0.015%