INDEX
Explanations
instances of government authority and the use of force
New Auto-Interp
Negative Logits
enemy
-0.16
/ws
-0.15
/rfc
-0.15
antiago
-0.14
enemy
-0.14
lush
-0.14
Enemy
-0.14
defense
-0.14
hid
-0.14
.decorate
-0.14
POSITIVE LOGITS
force
0.38
violence
0.33
FORCE
0.32
force
0.31
Force
0.31
-force
0.29
resort
0.29
Violence
0.28
physical
0.27
Force
0.27
Activations Density 0.199%