INDEX
Explanations
mentions of non-violence, social activism or resistance
terms associated with violence and its non-violent alternatives
New Auto-Interp
Negative Logits
ibal
-0.76
rans
-0.64
verages
-0.64
saw
-0.63
nan
-0.62
Haf
-0.61
abase
-0.61
older
-0.61
Lumpur
-0.61
ween
-0.61
POSITIVE LOGITS
theless
0.85
iferation
0.76
istance
0.75
DragonMagazine
0.72
iterranean
0.69
anmar
0.64
istant
0.63
ances
0.61
ensical
0.61
otine
0.61
Activations Density 0.055%