INDEX
Explanations
references to political leaders and their actions
New Auto-Interp
Negative Logits
axe
-0.16
ements
-0.16
gon
-0.16
ê°IJ
-0.15
Uni
-0.14
ãĥ¼ãĥĪ
-0.14
pairs
-0.14
iras
-0.14
BORDER
-0.14
IVA
-0.14
POSITIVE LOGITS
Naw
0.22
PT
0.21
PPP
0.20
Mush
0.18
MÃ¼ÅŁ
0.18
ÐĿав
0.17
PPP
0.17
Alta
0.17
NTN
0.17
Im
0.17
Activations Density 0.022%