INDEX
Explanations
references to helping or supporting others
New Auto-Interp
Negative Logits
ignet
-0.17
cht
-0.16
ActionTypes
-0.15
orest
-0.15
FAILURE
-0.14
allow
-0.14
usta
-0.14
Bread
-0.14
.struts
-0.13
coop
-0.13
POSITIVE LOGITS
understand
0.21
get
0.17
achieve
0.17
kaar
0.17
izzy
0.16
/us
0.15
767
0.15
avoid
0.15
feel
0.15
transition
0.15
Activations Density 0.063%