INDEX
Explanations
expressions of amazement or excitement
New Auto-Interp
Negative Logits
emean
-0.17
agna
-0.15
عاÙħØ©
-0.15
ì¶ľ
-0.14
ÑģÑĮ
-0.14
nech
-0.13
(éĩij
-0.13
cá»Ń
-0.13
ì°©
-0.13
jourd
-0.13
POSITIVE LOGITS
zers
0.33
zer
0.27
za
0.25
ser
0.23
zas
0.21
another
0.20
talk
0.19
Factor
0.19
ow
0.19
indeed
0.19
Activations Density 0.033%