INDEX
Explanations
proper nouns related to politics and news
references to political figures and events
New Auto-Interp
Negative Logits
lihood
-0.75
_.
-0.71
CONCLUS
-0.70
States
-0.66
tics
-0.66
destro
-0.64
6000
-0.63
smugglers
-0.63
Closure
-0.62
MU
-0.62
POSITIVE LOGITS
embattled
0.78
Donald
0.68
superstar
0.63
dinand
0.61
Antonio
0.60
John
0.60
Adolf
0.60
iconic
0.59
Tim
0.58
...
0.57
Activations Density 0.283%