INDEX
Explanations
expressions of intent or purpose related to goals and aims
New Auto-Interp
Negative Logits
uous
-0.19
es
-0.17
edom
-0.16
esa
-0.15
an
-0.15
astle
-0.14
olio
-0.14
_pb
-0.14
icol
-0.14
nee
-0.14
POSITIVE LOGITS
lessly
0.28
yro
0.19
Aim
0.19
/target
0.17
aim
0.17
erais
0.16
tır
0.16
aim
0.16
ÑĤеÑģÑĮ
0.16
/go
0.15
Activations Density 0.014%