INDEX
Explanations
words related to political figures and activities
terms related to political contexts and events
New Auto-Interp
Negative Logits
Alto
-0.61
flush
-0.59
stadt
-0.59
leasing
-0.58
alone
-0.56
Weber
-0.56
tails
-0.55
Mist
-0.55
Desktop
-0.54
door
-0.54
POSITIVE LOGITS
correctness
0.75
speeches
0.69
ndum
0.67
clinton
0.65
pamph
0.63
slogans
0.61
aroo
0.61
xual
0.61
EStream
0.60
hostage
0.60
Activations Density 0.600%