INDEX
Explanations
expressions related to physical sensations and emotional experiences
New Auto-Interp
Negative Logits
McCabe
-0.16
ODE
-0.15
bid
-0.15
dost
-0.14
preca
-0.14
плав
-0.14
uai
-0.13
Ñģви
-0.13
esda
-0.13
ode
-0.13
POSITIVE LOGITS
egin
0.19
ropy
0.15
itter
0.15
kin
0.15
iglia
0.14
need
0.14
کت
0.14
Moran
0.14
differently
0.14
ÑĩÑĥ
0.14
Activations Density 0.055%