INDEX
Explanations
mentions of political figures or discussions
terms related to politics and political discussions
New Auto-Interp
Negative Logits
Warrant
-0.72
pity
-0.70
Carbuncle
-0.69
MQ
-0.67
Moor
-0.65
PORT
-0.63
Ved
-0.63
Thumbnails
-0.62
lights
-0.61
Viking
-0.60
POSITIVE LOGITS
icians
1.39
ician
1.33
ifact
1.20
icial
1.11
ically
1.10
eness
0.99
ique
0.88
cn
0.88
ileaks
0.82
icity
0.81
Activations Density 0.021%