INDEX
Explanations
references to taking action in response to various issues or crises
New Auto-Interp
Negative Logits
TR
-0.15
arge
-0.14
inea
-0.14
ongs
-0.14
vidence
-0.14
elder
-0.14
asti
-0.13
tryside
-0.13
TRA
-0.13
ayer
-0.13
POSITIVE LOGITS
action
0.19
towards
0.18
actions
0.18
ACTION
0.18
.ACTION
0.18
/action
0.17
-actions
0.17
(action
0.17
Action
0.17
-action
0.17
Activations Density 0.083%