INDEX
Explanations
phrases related to goals and aims
statements of goals or intentions
New Auto-Interp
Negative Logits
odox
-0.63
vae
-0.60
regular
-0.58
affe
-0.58
aro
-0.56
quin
-0.55
iery
-0.55
occas
-0.55
icol
-0.52
eor
-0.52
POSITIVE LOGITS
to
1.13
to
0.96
maximizing
0.84
ensuring
0.80
preservation
0.79
To
0.77
simple
0.74
simplicity
0.73
educating
0.71
TO
0.71
Activations Density 0.127%