INDEX
Explanations
references to historical figures or religious leaders
New Auto-Interp
Negative Logits
çoÄŁ
-0.16
DeÄŁer
-0.15
hâl
-0.15
ataire
-0.15
deÅŁ
-0.14
бли
-0.14
abaj
-0.14
rej
-0.14
REQ
-0.14
evice
-0.14
POSITIVE LOGITS
Ankara
0.20
Pam
0.19
Asian
0.18
Asia
0.18
Asia
0.18
asian
0.18
Kale
0.18
Batman
0.17
غاز
0.17
Batman
0.17
Activations Density 0.020%