INDEX
Explanations
references to political ideologies, particularly left-wing and right-wing distinctions
New Auto-Interp
Negative Logits
FX
-0.15
gn
-0.14
azi
-0.14
ubre
-0.14
leston
-0.14
Edgar
-0.14
procs
-0.14
bou
-0.14
ãĥĭãĥ¼
-0.13
icip
-0.13
POSITIVE LOGITS
yal
0.16
licant
0.15
flen
0.14
WARD
0.14
muschi
0.14
aticon
0.14
sworth
0.13
vron
0.13
olina
0.13
artz
0.13
Activations Density 0.022%