INDEX
Explanations
references to political figures and their perceived characteristics or actions
New Auto-Interp
Negative Logits
å¢ĥ
-0.16
IFn
-0.15
');");↵
-0.15
oleÄį
-0.15
ventions
-0.15
üle
-0.14
ạng
-0.14
stery
-0.14
mploy
-0.14
ertainment
-0.14
POSITIVE LOGITS
figure
0.19
moderate
0.18
outsider
0.18
abras
0.17
politician
0.17
polar
0.17
prote
0.17
Rhodes
0.17
consum
0.16
cage
0.16
Activations Density 0.174%