INDEX
Explanations
actions related to bending or leaning downward
New Auto-Interp
Negative Logits
eneg
-0.18
onas
-0.17
ugin
-0.17
iali
-0.15
pac
-0.14
utin
-0.14
onaut
-0.14
abeth
-0.14
icana
-0.14
ubat
-0.14
POSITIVE LOGITS
erek
0.16
Adopt
0.15
AVIS
0.15
lv
0.14
otti
0.14
éĸĢ
0.14
essel
0.13
annon
0.13
igo
0.13
533
0.13
Activations Density 0.043%