INDEX
Explanations
first-person pronouns and active involvement or engagement in tasks
New Auto-Interp
Negative Logits
styl
-0.16
/loader
-0.16
urm
-0.15
Jesse
-0.15
(format
-0.14
onn
-0.14
rlen
-0.14
atus
-0.13
lion
-0.13
ardon
-0.13
POSITIVE LOGITS
.inc
0.17
essian
0.16
eprom
0.16
สม
0.16
_defs
0.15
HITE
0.15
imm
0.15
ernes
0.15
Haz
0.14
Gilles
0.14
Activations Density 0.002%