INDEX
Explanations
references to middle-class socioeconomic status
New Auto-Interp
Negative Logits
ickle
-0.16
shed
-0.15
soever
-0.15
radi
-0.15
.Factory
-0.15
Harden
-0.15
dsl
-0.15
acle
-0.14
chema
-0.14
isphere
-0.14
POSITIVE LOGITS
-upper
0.18
wares
0.18
-aged
0.17
hausen
0.17
Ages
0.17
most
0.17
sex
0.16
/end
0.15
-middle
0.15
воз
0.15
Activations Density 0.017%