INDEX
Explanations
terms related to nonviolence and terms related to violence
terms related to violence and nonviolence
New Auto-Interp
Negative Logits
rot
-0.76
rosc
-0.76
ween
-0.75
GPU
-0.72
Tycoon
-0.71
ulas
-0.70
cedented
-0.69
awed
-0.69
ummies
-0.69
ibo
-0.68
POSITIVE LOGITS
nonviolent
0.95
disobedience
0.75
cessation
0.71
violent
0.71
pacif
0.71
activism
0.70
measure
0.70
agitation
0.69
justice
0.68
resistance
0.68
Activations Density 0.008%