INDEX
Explanations
names, seemingly relating to political figures or leaders
mentions of specific political figures
New Auto-Interp
Negative Logits
nces
-0.79
Reviewer
-0.78
AX
-0.78
mble
-0.71
TOP
-0.69
llo
-0.69
ngth
-0.67
Torrent
-0.67
ĻĤ
-0.67
furt
-0.67
POSITIVE LOGITS
Sharif
1.23
icum
0.87
abad
0.86
smith
0.84
merce
0.77
ding
0.76
shire
0.74
istani
0.72
issa
0.72
Hussein
0.71
Activations Density 0.012%