INDEX
Explanations
phrases indicating emphasis or affirmation
New Auto-Interp
Negative Logits
utherford
-0.20
eyer
-0.17
throws
-0.17
izzo
-0.16
ahl
-0.15
æķ·
-0.15
ultz
-0.14
ango
-0.14
egers
-0.14
ksen
-0.14
POSITIVE LOGITS
δη
0.33
tức
0.29
å°±æĺ¯
0.29
ÑĤобÑĤо
0.28
heiÃŁ
0.27
åį³
0.25
decir
0.25
ì¦ī
0.24
ãģ¤
0.22
еÑģÑĤÑĮ
0.22
Activations Density 0.053%