INDEX
Explanations
mentions of the White House
mentions of the White House
New Auto-Interp
Negative Logits
ITAL
-0.85
odcast
-0.75
raints
-0.74
orsi
-0.73
tics
-0.71
trak
-0.70
WATCHED
-0.69
ModLoader
-0.68
olls
-0.66
rg
-0.65
POSITIVE LOGITS
berry
0.95
caps
0.95
house
0.90
hall
0.88
White
0.88
suprem
0.83
zee
0.82
supremacist
0.81
Sox
0.80
horse
0.79
Activations Density 0.015%