INDEX
Explanations
reflexive actions, self-directed actions
New Auto-Interp
Negative Logits
actually
0.59
riet
0.54
Oh
0.54
native
0.54
thôi
0.53
oh
0.53
icione
0.52
actual
0.52
ించిన
0.52
ói
0.51
POSITIVE LOGITS
себя
3.50
oneself
3.48
themselves
3.43
yourself
3.27
herself
3.26
himself
3.25
себе
3.22
ourselves
3.12
zichzelf
3.08
自己
2.99
Activations Density 0.293%