INDEX
Explanations
verbs related to initiation or starting actions
New Auto-Interp
Negative Logits
erner
-0.19
roma
-0.16
arna
-0.15
erna
-0.14
aca
-0.14
ovÃŃ
-0.14
fcn
-0.14
kea
-0.13
\views
-0.13
mos
-0.13
POSITIVE LOGITS
лей
0.16
ãĥ¼ãĤ¸
0.15
uckle
0.15
auga
0.14
çĦ
0.14
Ñī
0.14
Äĩe
0.14
edom
0.14
igr
0.14
arris
0.14
Activations Density 0.052%