INDEX
Explanations
mentions of influential political figures and educational institutions
New Auto-Interp
Negative Logits
ereotype
-0.15
allo
-0.14
utsch
-0.14
nilai
-0.14
ando
-0.13
intl
-0.13
inka
-0.13
.magic
-0.13
ore
-0.13
ording
-0.12
POSITIVE LOGITS
#aa
0.15
ç·Ĵ
0.14
498
0.14
RequestOptions
0.13
884
0.13
lasses
0.13
ownik
0.13
Rodgers
0.13
áce
0.13
saf
0.12
Activations Density 0.034%