INDEX
Explanations
words related to politics and politicians
mentions of political entities or terms
New Auto-Interp
Negative Logits
RED
-0.71
uran
-0.70
pity
-0.70
urous
-0.69
Carbuncle
-0.68
uring
-0.67
Moor
-0.67
lamm
-0.67
MQ
-0.66
URN
-0.66
POSITIVE LOGITS
icians
1.17
ifact
1.10
ician
1.03
icial
0.96
ically
0.94
Polit
0.89
eness
0.86
correctness
0.83
ique
0.78
Pengu
0.77
Activations Density 0.009%