INDEX
Explanations
references to candidates in a political context
New Auto-Interp
Negative Logits
fully
-0.19
thing
-0.17
NAL
-0.15
eming
-0.15
lix
-0.15
coming
-0.14
еÑĩение
-0.14
keit
-0.14
ides
-0.14
ãģ¾ãģŁ
-0.14
POSITIVE LOGITS
hood
0.19
imension
0.16
upiter
0.16
ucci
0.15
ries
0.15
regs
0.15
ÃŃž
0.15
BOVE
0.14
ura
0.14
leri
0.14
Activations Density 0.034%