INDEX
Explanations
words related to intentional and conscious actions
New Auto-Interp
Negative Logits
sel
-0.16
urm
-0.15
eler
-0.15
ANGLES
-0.15
presso
-0.14
pector
-0.14
ixel
-0.14
cum
-0.14
arend
-0.14
sonst
-0.14
POSITIVE LOGITS
mente
0.17
izia
0.15
ÑĪин
0.15
297
0.15
-mf
0.15
398
0.14
aways
0.14
evin
0.14
atio
0.14
281
0.14
Activations Density 0.010%