INDEX
Explanations
phrases related to physical altercations and conflicts
instances of violence or physical altercations
New Auto-Interp
Negative Logits
ngth
-0.75
oliberal
-0.72
preference
-0.69
implants
-0.66
erton
-0.66
cert
-0.65
algia
-0.65
biases
-0.64
Reviewer
-0.64
20439
-0.63
POSITIVE LOGITS
ensued
1.45
erupted
1.19
erupt
1.11
escalated
1.08
pandemonium
1.04
escalate
0.96
escalating
0.95
ens
0.95
escal
0.94
between
0.93
Activations Density 0.327%