INDEX
Explanations
keywords related to political figures and controversial actions or statements
New Auto-Interp
Negative Logits
ACTED
-0.69
occup
-0.68
PDATE
-0.66
carriers
-0.66
easing
-0.63
explosives
-0.63
Overwatch
-0.61
pref
-0.60
alpha
-0.60
ultrasound
-0.59
POSITIVE LOGITS
ict
1.05
inas
1.00
ildo
0.98
enture
0.97
isson
0.96
onna
0.96
illon
0.94
iral
0.94
itto
0.92
illian
0.92
Activations Density 0.020%