INDEX
Explanations
phrases and words related to motivation and pushing boundaries
New Auto-Interp
Negative Logits
/by
-0.17
kt
-0.17
jac
-0.16
fred
-0.16
uco
-0.16
bsolute
-0.15
vez
-0.15
kir
-0.15
throp
-0.15
ullen
-0.14
POSITIVE LOGITS
aside
0.35
buttons
0.30
boundaries
0.28
envelope
0.28
harder
0.27
forward
0.27
limits
0.27
ahead
0.26
buttons
0.25
hard
0.25
Activations Density 0.041%