INDEX
Explanations
references to diverse ethnic backgrounds and identities
New Auto-Interp
Negative Logits
éĤ¦
-0.15
onta
-0.15
arat
-0.15
oden
-0.15
avers
-0.14
Terraria
-0.14
lider
-0.13
fak
-0.13
rikes
-0.13
oko
-0.13
POSITIVE LOGITS
descent
0.40
decent
0.35
heritage
0.33
-born
0.32
born
0.31
-desc
0.30
born
0.29
desc
0.28
origin
0.28
-des
0.27
Activations Density 0.086%