INDEX
Explanations
mentions of the White House
mentions of the White House
New Auto-Interp
Negative Logits
ITAL
-0.83
odcast
-0.77
raints
-0.72
orsi
-0.69
SIGN
-0.67
Occup
-0.67
ENDED
-0.66
tics
-0.66
olls
-0.65
WATCHED
-0.65
POSITIVE LOGITS
caps
1.07
White
1.04
berry
0.96
house
0.93
supremacist
0.92
supremacists
0.91
hall
0.91
suprem
0.90
zee
0.83
Sox
0.82
Activations Density 0.014%