INDEX
Explanations
references to political entities and events
New Auto-Interp
Negative Logits
agem
-0.18
Savage
-0.16
commod
-0.15
TM
-0.15
ynos
-0.15
urat
-0.15
áž
-0.14
ä»ģ
-0.14
Lift
-0.14
ereotype
-0.14
POSITIVE LOGITS
antim
0.20
antib
0.20
radical
0.20
antic
0.19
rad
0.19
leaders
0.18
moderate
0.17
anti
0.17
_rad
0.17
social
0.17
Activations Density 0.049%