INDEX
Explanations
words related to the concept of "can" or ability
New Auto-Interp
Negative Logits
thon
-0.17
thác
-0.16
Ïģια
-0.15
baÅŁ
-0.15
itting
-0.15
ITY
-0.15
erti
-0.15
ry
-0.14
æĺŃ
-0.14
oles
-0.14
POSITIVE LOGITS
ing
0.19
woord
0.18
elope
0.18
elerik
0.16
ler
0.16
y
0.16
Absolute
0.16
yaw
0.15
ucket
0.15
uario
0.15
Activations Density 0.034%