INDEX
Explanations
expressions of disbelief or surprise
New Auto-Interp
Negative Logits
.Companion
-0.17
edad
-0.16
okies
-0.15
ÑĢиг
-0.15
ÙģØª
-0.15
elman
-0.14
ovah
-0.14
oku
-0.14
aku
-0.14
abaj
-0.14
POSITIVE LOGITS
dde
0.17
poll
0.16
polit
0.15
ampo
0.15
ire
0.14
uÅŁ
0.14
portion
0.14
alles
0.14
itta
0.14
dig
0.14
Activations Density 0.155%