INDEX
Explanations
actions related to physical aggression or harm, particularly involving law enforcement or authority figures
actions and events related to violence and law enforcement
New Auto-Interp
Negative Logits
audits
-0.71
memos
-0.67
controversies
-0.67
zbollah
-0.66
promotions
-0.64
collaborations
-0.64
pmwiki
-0.62
documentaries
-0.61
scandals
-0.61
disadvantages
-0.60
POSITIVE LOGITS
him
1.66
him
1.27
them
1.22
them
1.04
HIM
0.99
whoever
0.88
Him
0.87
THEM
0.86
her
0.84
everybody
0.81
Activations Density 0.312%