INDEX
Explanations
words related to physical movement or actions
New Auto-Interp
Negative Logits
kek
-0.19
InBackground
-0.17
esson
-0.16
kenin
-0.15
emax
-0.15
ÅĤe
-0.14
ooter
-0.14
åīĤ
-0.14
лади
-0.14
bine
-0.14
POSITIVE LOGITS
ysis
0.19
yg
0.17
asel
0.17
é϶
0.17
olia
0.16
yer
0.15
DP
0.14
ierz
0.14
ench
0.14
utor
0.14
Activations Density 0.023%