INDEX
Explanations
names of political figures, particularly advisors and strategists
names of political figures and strategists
New Auto-Interp
Negative Logits
EVA
-0.80
phis
-0.79
ivity
-0.76
venge
-0.73
played
-0.72
EP
-0.70
ires
-0.68
oxide
-0.67
race
-0.66
SERVICE
-0.64
POSITIVE LOGITS
Bannon
1.18
espie
0.92
olini
0.80
fried
0.80
Yiannopoulos
0.80
Jinping
0.78
halla
0.76
itbart
0.76
andowski
0.76
aides
0.76
Activations Density 0.008%