INDEX
Explanations
actions related to authority and conflict
mentions of violent actions and criminal behavior
New Auto-Interp
Negative Logits
£ı
-0.83
TPP
-0.66
curated
-0.65
spons
-0.65
endorsements
-0.62
stoked
-0.60
leaks
-0.59
controversies
-0.58
hra
-0.58
stadiums
-0.58
POSITIVE LOGITS
him
1.39
them
1.15
whoever
1.01
him
1.01
Mr
0.98
Ms
0.87
them
0.86
HIM
0.83
everyone
0.80
Mrs
0.79
Activations Density 0.532%