INDEX
Explanations
expressions of helplessness or lack of agency
New Auto-Interp
Negative Logits
.clipsToBounds
-0.15
comport
-0.14
sing
-0.14
ounge
-0.14
ulace
-0.14
ugo
-0.13
lug
-0.13
iola
-0.13
WhiteSpace
-0.13
ilenames
-0.13
POSITIVE LOGITS
action
0.28
steps
0.28
nothing
0.28
action
0.26
Action
0.25
-action
0.25
ACTION
0.25
ACTION
0.24
done
0.24
Steps
0.23
Activations Density 0.159%