INDEX
Explanations
references to religious and ethnic minorities
New Auto-Interp
Negative Logits
peria
-0.16
Bald
-0.16
uben
-0.14
bald
-0.14
ecessarily
-0.14
egas
-0.14
.bc
-0.14
लत
-0.14
bounce
-0.13
èIJ½ãģ¡
-0.13
POSITIVE LOGITS
minority
0.27
minorities
0.23
Minority
0.22
Minor
0.21
minor
0.21
eth
0.19
_eth
0.18
Minor
0.18
minor
0.18
Eth
0.17
Activations Density 0.083%