INDEX
Explanations
references to political parties and geographic locations
New Auto-Interp
Negative Logits
uder
-0.17
521
-0.16
sank
-0.15
sinks
-0.15
OOK
-0.15
andler
-0.15
rema
-0.15
619
-0.14
umbo
-0.14
ehr
-0.14
POSITIVE LOGITS
tering
0.19
ota
0.16
रण
0.15
каз
0.14
ırı
0.14
üstü
0.14
.dds
0.14
OTA
0.14
/INFO
0.13
erral
0.13
Activations Density 0.047%