INDEX
Explanations
terms related to specific types of human identification, such as race and gender
suffixes indicating specific characteristics or attributes
New Auto-Interp
Negative Logits
ĸļ
-0.82
perty
-0.77
otation
-0.70
owitz
-0.69
HAEL
-0.66
arcity
-0.65
hitting
-0.63
showc
-0.62
earch
-0.62
senal
-0.62
POSITIVE LOGITS
kees
0.67
gentleman
0.65
bush
0.63
hood
0.62
ALLY
0.61
Gate
0.61
lihood
0.60
baum
0.58
Protestant
0.58
boy
0.58
Activations Density 0.191%