INDEX
Explanations
phrases related to political figures and events
references to political figures or discussions about politics
New Auto-Interp
Negative Logits
decomp
-0.72
hift
-0.68
comr
-0.66
JPEG
-0.64
cro
-0.61
scrim
-0.61
ĨĴ
-0.61
unaccount
-0.58
Mess
-0.57
Rivals
-0.56
POSITIVE LOGITS
s
1.22
ship
1.06
alty
0.95
ufact
0.93
tarian
0.89
sf
0.88
tal
0.88
gins
0.87
ity
0.87
sg
0.86
Activations Density 0.115%