INDEX
Explanations
mentions of the word "Trump."
repeated mentions of the word "trumps" and other related variations
New Auto-Interp
Negative Logits
raq
-0.84
ANC
-0.84
GAN
-0.79
ANCE
-0.77
zai
-0.76
srfAttach
-0.75
RW
-0.72
ANA
-0.71
BIL
-0.70
WORK
-0.70
POSITIVE LOGITS
hift
1.00
manship
1.00
paces
0.95
pace
0.87
ilver
0.85
peed
0.84
poons
0.82
oulos
0.82
uits
0.79
hops
0.79
Activations Density 0.028%