INDEX
Explanations
calls to action or invitations to participate in some activity
New Auto-Interp
Negative Logits
569
-0.07
ersh
-0.07
amedi
-0.06
iring
-0.06
Alone
-0.06
Enemies
-0.06
298
-0.06
-0.06
oce
-0.06
alone
-0.06
POSITIVE LOGITS
attention
0.08
IPS
0.08
-action
0.08
action
0.07
UNUSED
0.07
orney
0.07
arms
0.07
_codec
0.07
Duty
0.07
icode
0.07
Activations Density 0.011%