INDEX
Explanations
mentions of political figures or topics
references to political topics
New Auto-Interp
Negative Logits
Warrant
-0.74
warrants
-0.68
pity
-0.67
Carbuncle
-0.66
recall
-0.62
Gutenberg
-0.62
PORT
-0.61
MQ
-0.61
Ved
-0.61
FACE
-0.60
POSITIVE LOGITS
icians
1.64
ician
1.50
ifact
1.26
ically
1.22
eness
1.17
icial
1.09
ileaks
0.99
ique
0.97
icking
0.95
kov
0.93
Activations Density 0.026%