INDEX
Explanations
words related to politics
references to political figures and entities
New Auto-Interp
Negative Logits
Carbuncle
-0.72
PORT
-0.70
Warrant
-0.68
uran
-0.67
pity
-0.67
RED
-0.66
vasive
-0.66
urous
-0.66
DERR
-0.66
LEASE
-0.65
POSITIVE LOGITS
icians
1.28
ician
1.16
ifact
1.09
ically
1.04
icial
1.01
eness
0.90
Polit
0.90
ique
0.83
correctness
0.83
icity
0.78
Activations Density 0.008%